# Models of ZFC

Model theory is really meta, so you will have to pay attention here.

Roughly, a “model of ${\mathsf{ZFC}}$” is a set with a binary relation that satisfies the ${\mathsf{ZFC}}$ axioms, just as a group is a set with a binary operation that satisfies the group axioms. Unfortunately, unlike with groups, it is very hard for me to give interesting examples of models, for the simple reason that we are literally trying to model the entire universe.

## 1. Models

Prototypical example for this section: ${(\omega, \in)}$ obeys ${\mathrm{PowerSet}}$, ${V_\kappa}$ is a model for ${\kappa}$ inaccessible (later).

Definition 1 A model ${\mathscr M}$ consists of a set ${M}$ and a binary relation ${E \subseteq M \times M}$. (The ${E}$ relation is the “${\in}$” for the model.)

Remark 2 I’m only considering set-sized models where ${M}$ is a set. Experts may be aware that I can actually play with ${M}$ being a class, but that would require too much care for now.

If you have a model, you can ask certain things about it. For example, you can ask “does it satisfy ${\mathrm{EmptySet}}$?”. Let me give you an example of what I mean, and then make it rigorous.

Example 3 (A Stupid Model) Let’s take ${\mathscr M = (M,E) = \left( \omega, \in \right)}$. This is not a very good model of ${\mathsf{ZFC}}$, but let’s see if we can make sense of some of the first few axioms.

1. ${\mathscr M}$ satisfies ${\mathrm{Extensionality}}$, which is the sentence

$\displaystyle \forall x \forall y \forall a : \left( a \in x \iff a \in y \right) \implies x = y.$

This just follows from the fact that ${E}$ is actually ${\in}$.

2. ${\mathscr M}$ satisfies ${\mathrm{EmptySet}}$, which is the sentence

$\displaystyle \exists a : \forall x \; \neg (x \in a).$

Namely, take ${a = \varnothing \in \omega}$.

3. ${\mathscr M}$ does not satisfy ${\mathrm{Pairing}}$, since ${\{1,3\}}$ is not in ${\omega}$, even though ${1, 3 \in \omega}$
4. Miraculously, ${\mathscr M}$ satisfies ${\mathrm{Union}}$, since for any ${n \in \omega}$, ${\cup n}$ is ${n-1}$ (unless ${n=0}$). The Union axiom statements that

$\displaystyle \forall a \exists z \quad \forall x \; (x \in z) \iff (\exists y : x \in y \in z).$

An important thing to notice is that the “${\forall a}$” ranges only over the sets in the model of the universe, ${\mathscr M}$.

Example 4 (Important: This Stupid Model Satisfies ${\mathrm{PowerSet}}$) Most incredibly of all: ${\mathscr M = (\omega, \in)}$ satisfies ${\mathrm{PowerSet}}$. This is a really important example. You might think this is ridiculous. Look at ${2 = \{0,1\}}$. The power set of this is ${\{0, 1, 2, \{1\}\}}$ which is not in the model, right?

Well, let’s look more closely at ${\mathrm{PowerSet}}$. It states that:

$\displaystyle \forall x \exists a \forall y (y \in a \iff y \subseteq x).$

What happens if we set ${x = 2 = \{0,1\}}$? Well, actually, we claim that ${a = 3 = \{0,1,2\}}$ works. The key point is “for all ${y}$” — this only ranges over the objects in ${\mathscr M}$. In ${\mathscr M}$, the only subsets of ${2}$ are ${0 = \varnothing}$, ${1 = \{0\}}$ and ${2 = \{0,1\}}$. The “set” ${\{1\}}$ in the “real world” (in ${V}$) is not a set in the model ${\mathscr M}$.

In particular, you might say that in this strange new world, we have ${2^n = n+1}$, since ${n = \{0,1,\dots,n-1\}}$ really does have only ${n+1}$ subsets.

Example 5 (Sentences with Parameters) The sentences we ask of our model are allowed to have “parameters” as well. For example, if ${\mathscr M = (\omega, \in)}$ as before then ${\mathscr M}$ satisfies the sentence

$\displaystyle \forall x \in 3 (x \in 5).$

## 2. Sentences and Satisfaction

With this intuitive notion, we can define what it means for a model to satisfy a sentence.

Definition 6 Note that any sentence ${\phi}$ can be written in one of the following five forms:

• ${x \in y}$
• ${x = y}$
• ${\neg \psi}$ (“not ${\psi}$”) for some shorter sentence ${\psi}$
• ${\psi_1 \lor \psi_2}$ (“${\psi_1}$ or ${\psi_2}$”) for some shorter sentences ${\psi_1}$, ${\psi_1}$
• ${\exists x \psi}$ (“exists ${x}$”) for some shorter sentence ${\psi}$.

Ques 7 What happened to ${\land}$ (and) and ${\forall}$ (for all)? (Hint: use ${\neg}$.)

Often (almost always, actually) we will proceed by so-called “induction on formula complexity”, meaning that we define or prove something by induction using this. Note that we require all formulas to be finite.

Now suppose we have a sentence ${\phi}$, like ${a = b}$ or ${\exists a \forall x \neg (x \in a)}$, plus a model ${\mathscr M = (M,E)}$. We want to ask whether ${\mathscr M}$ satisfies ${\phi}$.

To give meaning to this, we have to designate certain variables as parameters. For example, if I asked you “Does ${a=b}$?” the first question you would ask is what ${a}$ and ${b}$ are. So ${a}$, ${b}$ would be parameters: I have to give them values for this sentence to make sense.

On the other hand, if I asked you “Does ${\exists a \forall x \neg (x \in a)}$?” then you would just say “yes”. In this case, ${x}$ and ${a}$ are not parameters. In general, parameters are those variables whose meaning is not given by some ${\forall}$ or ${\exists}$.

In what follows, we will let ${\phi(x_1, \dots, x_n)}$ denote a formula ${\phi}$, whose parameters are ${x_1}$, \dots, ${x_n}$. Note that possibly ${n=0}$, for example all ${\mathsf{ZFC}}$ axioms have no parameters.

Ques 8 Try to guess the definition of satisfaction before reading it below. (It’s not very hard to guess!)

Definition 9 Let ${\mathscr M=(M,E)}$ be a model. Let ${\phi(x_1, \dots, x_n)}$ be a sentence, and let ${b_1, \dots, b_n \in M}$. We will define a relation

$\displaystyle \mathscr M \vDash \phi[b_1, \dots, b_n]$

and say ${\mathscr M}$ satisfies the sentence ${\phi}$ with parameters ${b_1, \dots, b_n}$.

The relationship is defined by induction on formula complexity as follows:

• If ${\phi}$ is “${x_1=x_2}$” then ${\mathscr M \vDash \phi[b_1, b_2] \iff b_1 = b_2}$.
• If ${\phi}$ is “${x_1\in x_2}$” then ${\mathscr M \vDash \phi[b_1, b_2] \iff b_1 \; E \; b_2}$.
(This is what we mean by “${E}$ interprets ${\in}$”.)
• If ${\phi}$ is “${\neg \psi}$” then ${\mathscr M \vDash \phi[b_1, \dots, b_n] \iff \mathscr M \not\vDash \phi[b_1, \dots, b_n]}$.
• If ${\phi}$ is “${\psi_1 \lor \psi_2}$” then ${\mathscr M \vDash \phi[b_1, \dots, b_n]}$ means ${\mathscr M \vDash \psi_i[b_1, \dots, b_n]}$ for some ${i=1,2}$.
• Most important case: suppose ${\phi}$ is ${\exists x \psi(x,x_1, \dots, x_n)}$. Then ${\mathscr M \vDash \phi[b_1, \dots, b_n]}$ if and only if

$\displaystyle \exists b \in M \text{ such that } \mathscr M \vDash \psi[b, b_1, \dots, b_n].$

Note that ${\psi}$ has one extra parameter.

Notice where the information of the model actually gets used. We only ever use ${E}$ in interpreting ${x_1 \in x_2}$; unsurprising. But we only ever use the set ${M}$ when we are running over ${\exists}$ (and hence ${\forall}$). That’s well-worth keeping in mind: The behavior of a model essentially comes from ${\exists}$ and ${\forall}$, which search through the entire model ${M}$.

And finally,

Definition 10 A model of ${\mathsf{ZFC}}$ is a model ${\mathscr M = (M,E)}$ satisfying all ${\mathsf{ZFC}}$ axioms.

We are especially interested in models of the form ${(M, \in)}$, where ${M}$ is a transitive set. (We want our universe to be transitive, otherwise we would have elements of sets which are not themselves in the universe, which is very strange.) Such a model is called a transitive model. If ${M}$ is a transitive set, the model ${(M, \in)}$ will be abbreviated to just ${M}$.

Definition 11 An inner model of ${\mathsf{ZFC}}$ is a transitive model satisfying ${\mathsf{ZFC}}$.

## 3. The Levy Hierarchy

Prototypical example for this section: ${\mathtt{isSubset}(x,y)}$ is absolute. The axiom ${\mathrm{EmptySet}}$ is ${\Sigma_1}$, ${\mathtt{isPowerSetOf}(X,x)}$ is ${\Pi_1}$.

A key point to remember is that the behavior of a model is largely determined by ${\exists}$ and ${\forall}$. It turns out we can say even more than this.

Consider a formula such as

$\displaystyle \mathtt{isEmpty}(x) : \neg \exists a (a \in x)$

which checks whether a given set ${x}$ has a nonempty element. Technically, this has an “${\exists}$” in it. But somehow this ${\exists}$ does not really search over the entire model, because it is bounded to search in ${x}$. That is, we might informally rewrite this as

$\displaystyle \neg (\exists x \in a)$

which doesn’t fit into the strict form, but points out that we are only looking over ${a \in x}$. We call such a quantifier a bounded quantifier.

We like sentences with bounded quantifiers because they designate properties which are absolute over transitive models. It doesn’t matter how strange your surrounding model ${M}$ is. As long as ${M}$ is transitive,

$\displaystyle M \vDash \mathtt{isEmpty}(\varnothing)$

will always hold. Similarly, the sentence

$\displaystyle \mathtt{isSubset}(x,y) : x \subseteq y \text { i.e. } \forall a \in x (a \in y).$

Sentences with this property are called ${\Sigma_0}$ or ${\Pi_0}$.

The situation is different with a sentence like

$\displaystyle \mathtt{isPowerSetOf}(y,x) : \forall z \left( z \subseteq x \iff z \in y \right)$

which in English means “${y}$ is the power set of ${x}$”, or just ${y = \mathcal P(x)}$. The ${\forall z}$ is not bounded here. This weirdness is what allows things like

$\displaystyle \omega \vDash \text{} \{0,1,2\} \text{ is the power set of }\{0,1\}\text{''}$

and hence

$\displaystyle \omega \vDash \mathrm{PowerSet}$

which was our stupid example earlier. The sentence ${\mathtt{isPowerSetOf}}$ consists of an unbounded ${\forall}$ followed by an absolute sentence, so we say it is ${\Pi_1}$.

More generally, the Levy hierarchy keeps track of how bounded our quantifiers are. Specifically,

• Formulas which have only bounded quantifiers are ${\Delta_0 = \Sigma_0 = \Pi_0}$.
• Formulas of the form ${\exists x_1 \dots \exists x_k \psi}$ where ${\psi}$ is ${\Pi_n}$ are consider ${\Sigma_{n+1}}$.
• Formulas of the form ${\forall x_1 \dots \forall x_k \psi}$ where ${\psi}$ is ${\Sigma_n}$ are consider ${\Pi_{n+1}}$.

(A formula which is both ${\Sigma_n}$ and ${\Pi_n}$ is called ${\Delta_n}$, but we won’t use this except for ${n=0}$.)

Example 12 (Examples of ${\Delta_0}$ Sentences) ${\empty}$

1. The sentences ${\mathtt{isEmpty}(x)}$, ${x \subseteq y}$, as discussed above.
2. The formula “${x}$ is transitive” can be expanded as a ${\Delta_0}$ sentence.
3. The formula “${x}$ is an ordinal” can be expanded as a ${\Delta_0}$ sentence.

Exercise 13 Write out the expansions for “${x}$ is transitive” and “${x}$ is ordinal” in a ${\Delta_0}$ form.

Example 14 (More Complex Formulas) ${\empty}$

1. The axiom ${\mathrm{EmptySet}}$ is ${\Sigma_1}$; it is ${\exists a (\mathtt{isEmpty}(a))}$, and ${\mathtt{isEmpty}(a)}$ is ${\Delta_0}$.
2. The formula “${y = \mathcal P(x)}$” is ${\Pi_1}$, as discussed above.
3. The formula “${x}$ is countable” is ${\Sigma_1}$. One way to phrase it is “${\exists f}$ an injective map ${x \hookrightarrow \omega}$”, which necessarily has an unbounded “${\exists f}$”.
4. The axiom ${\mathrm{PowerSet}}$ is ${\Pi_3}$:

$\displaystyle \forall y \exists P \forall x (x\subseteq y \iff x \in P).$

## 4. Substructures, and Tarski-Vaught

Let ${\mathscr M_1 = (M_1, E_1)}$ and ${\mathscr M_2 = (M_2, E_2)}$ be models.

Definition 15 We say that ${\mathscr M_1 \subseteq \mathscr M_2}$ if ${M_1 \subseteq M_2}$ and ${E_1}$ agrees with ${E_2}$; we say ${\mathscr M_1}$ is a substructure of ${\mathscr M_2}$.

That’s boring. The good part is:

Definition 16 We say ${\mathscr M_1 \prec \mathscr M_2}$, or ${\mathscr M_1}$ is an elementary substructure of ${\mathscr M_2}$, if for every sentence ${\phi(x_1, \dots, x_n)}$ and parameters ${b_1, \dots, b_n \in M_1}$, we have

$\displaystyle \mathscr M_1 \vDash \phi[b_1, \dots, b_n] \iff \mathscr M_2 \vDash \phi[b_1, \dots, b_n].$

In other words, ${\mathscr M_1}$ and ${\mathscr M_2}$ agree on every sentence possible. Note that the ${b_i}$ have to come from ${M_1}$; if the ${b_i}$ came from ${\mathscr M_2}$ then asking something of ${\mathscr M_1}$ wouldn’t make sense.

Let’s ask now: how would ${\mathscr M_1 \prec \mathscr M_2}$ fail to be true? If we look at the possibly sentences, none of the atomic formulas, nor the “${\land}$” and “${\neg}$”, are going to cause issues.

The intuition you should be getting by now is that things go wrong once we hit ${\forall}$ and ${\exists}$. They won’t go wrong for bounded quantifiers. But unbounded quantifiers search the entire model, and that’s where things go wrong.

To give a “concrete example”: imagine ${\mathscr M_1}$ is MIT, and ${\mathscr M_2}$ is the state of Massachusetts. If ${\mathscr M_1}$ thinks there exist hackers at MIT, certainly there exist hackers in Massachusetts. Where things go wrong is something like:

$\displaystyle \mathscr M_2 \vDash \text{} \exists x : x \text{ is a course numbered }> 50\text{''}.$

This is true for ${\mathscr M_2}$ because we can take the witness ${x = \text{Math 55}}$, say. But it’s false for ${\mathscr M_1}$, because at MIT all courses are numbered ${18.701}$ or something similar. The issue is that the witness for statements in ${\mathscr M_2}$ do not necessarily propagate up down to witnesses for ${\mathscr M_1}$, even though they do from ${\mathscr M_1}$ to ${\mathscr M_2}$.

The Tarski-Vaught test says this is the only impediment: if every witness in ${\mathscr M_2}$ can be replaced by one in ${\mathscr M_1}$ then ${\mathscr M_1 \prec \mathscr M_2}$.

Lemma 17 (Tarski-Vaught) Let ${\mathscr M_1 \subseteq \mathscr M_2}$. Then ${\mathscr M_1 \prec \mathscr M_2}$ if and only if for every sentence ${\phi(x, x_1, \dots, x_n)}$ and parameters ${b_1, \dots, b_n \in M_1}$: if there is a witness ${\tilde b \in M_2}$ to ${\mathscr M_2 \vDash \phi(\tilde b, b_1 \dots, b_n)}$ then there is a witness ${b \in M_1}$ to ${\mathscr M_1 \vDash \phi(b, b_1, \dots, b_n)}$.

Proof: Easy after the above discussion. To formalize it, use induction on formula complexity. $\Box$

## 5. Obtaining the Axioms of ${\mathsf{ZFC}}$

Extending the above ideas, one can obtain without much difficulty the following. The idea is that almost all the ${\mathsf{ZFC}}$ axioms are just ${\Sigma_1}$ claims about certain desired sets, and so verifying an axiom reduces to checking some appropriate “closure” condition: that the witness to the axiom is actually in the model.

For example, the ${\mathrm{EmptySet}}$ axiom is “${\exists a (\mathtt{isEmpty}(a))}$”, and so we’re happy as long as ${\varnothing \in M}$, which is of course true for any nonempty transitive set ${M}$.

Lemma 18 (Transitive Sets Inheriting ${\mathsf{ZFC}}$) Let ${M}$ be a nonempty transitive set. Then

1. ${M}$ satisfies ${\mathrm{Extensionality}}$, ${\mathrm{Foundation}}$, ${\mathrm{EmptySet}}$.
2. ${M \vDash \mathrm{Pairing}}$ if ${x,y \in M \implies \{x,y\} \in M}$.
3. ${M \vDash \mathrm{Union}}$ if ${x \in M \implies \cup x \in M}$.
4. ${M \vDash \mathrm{PowerSet}}$ if ${x \in M \implies \mathcal P(x) \cap M \in M}$.
5. ${M \vDash \mathrm{Replacement}}$ if for every ${x \in M}$ and every function ${F : x \rightarrow M}$ which is ${M}$-definable with parameters, we have ${F`x \in M}$ as well.
6. ${M \vDash \mathrm{Infinity}}$ as long as ${\omega \in M}$.

Here, a set ${X \subseteq M}$ is ${M}$-definable with parameters if it can be realized as

$\displaystyle X = \left\{ x \in M \mid \phi[x, b_1, \dots, b_n] \right\}$

for some (fixed) choice of parameters ${b_1,\dots,b_n \in M}$. We allow ${n=0}$, in which case we say ${X}$ is ${M}$-definable without parameters. Note that ${X}$ need not itself be in ${M}$! As a trivial example, ${X = M}$ is ${M}$-definable without parameters (just take ${\phi[x]}$ to always be true), and certainly we do not have ${X \in M}$.

Exercise 19 Verify (i)-(iv) above.

Remark 20 Converses to the statements of Lemma 18 are true for all claims other than (vii).

## 6. Mostowski Collapse

Up until now I have been only talking about transitive models, because they were easier to think about. Here’s a second, better reason we might only care about transitive models.

Lemma 21 (Mostowski Collapse) Let ${\mathscr X = (X,E)}$ be a model such that ${\mathscr X \vDash \mathrm{Extensionality} + \mathrm{Foundation}}$. Then there exists an isomorphism ${\pi : \mathscr X \rightarrow M}$ for a transitive model ${M = (M,\in)}$.

This is also called the transitive collapse. In fact, both ${\pi}$ and ${M}$ are unique.

Proof: The idea behind the proof is very simple. Since ${E}$ is well-founded and extensional, we can look at the ${E}$-minimal element ${x_\varnothing}$ of ${X}$ with respect to ${E}$. Clearly, we want to send that to ${0 = \varnothing}$.

Then we take the next-smallest set under ${E}$, and send it to ${1 = \{\varnothing\}}$. We “keep doing this”; it’s not hard to see this does exactly what we want.

To formalize, define ${\pi}$ by transfinite recursion:

$\displaystyle \pi(x) \overset{\mathrm{def}}{=} \left\{ \pi(y) \mid y \; E \; x \right\}.$

This ${\pi}$, by construction, does the trick. $\Box$

The picture of this is quite “collapsing” the elements of ${M}$ down to the bottom of ${V}$, hence the name.

## 7. Adding an Inaccessible, Skolem Hulls, and Going Insane

Prototypical example for this section: ${V_\kappa}$

At this point you might be asking, well, where’s my model of ${\mathsf{ZFC}}$?

I unfortunately have to admit now: ${\mathsf{ZFC}}$ can never prove that there is a model of ${\mathsf{ZFC}}$ (unless ${\mathsf{ZFC}}$ is inconsistent, but that would be even worse). This is a result called Gödel’s Incompleteness Theorem.

Nonetheless, with some very modest assumptions added, we can actually show that a model does exist: for example, assuming that there exists a strongly inaccessible cardinal ${\kappa}$ would do the trick, it turns out ${V_\kappa}$ will be such a model. Intuitively you can see why: ${\kappa}$ is so big that any set of rank lower than it can’t escape it even if we take their power sets, or any other method that ${\mathsf{ZFC}}$ lets us do.

More pessimistically, this shows that it’s impossible to prove in ${\mathsf{ZFC}}$ that such a ${\kappa}$ exists. Nonetheless, we now proceed under ${\mathsf{ZFC}^+}$ for convenience, which adds the existence of such a ${\kappa}$ as a final axiom. So we now have a model ${V_\kappa}$ to play with. Joy!

Great. Now we do something really crazy.

Theorem 22 (Countable Transitive Model) Assume ${\mathsf{ZFC}^+}$. Then there exists a transitive model ${M}$ of ${\mathsf{ZFC}}$ such that ${M}$ is a countable set.

Start with the set ${X_0 = \varnothing}$. Then for every integer ${n}$, we do the following to get ${X_{n+1}}$.

• Start with ${X_{n+1}}$ containing very element of ${X_n}$.
• Consider a formula ${\phi(x, x_1, \dots, x_n)}$ and ${b_1, \dots, b_n}$ in ${X_n}$. Suppose that ${M}$ thinks there is an ${b \in M}$ for which

$\displaystyle M \vDash \phi[b, b_1, \dots, b_n].$

We then add in the element ${b}$ to ${X_{n+1}}$.

• We do this for every possible formula in the language of set theory. We also have to put in every possible set of parameters from the previous set ${X_n}$.

At every step ${X_n}$ is countable. Reason: there are countably many possible finite sets of parameters in ${X_n}$, and countably many possible formulas, so in total we only ever add in countably many things at each step. This exhibits an infinite nested sequence of countable sets

$\displaystyle X_0 \subseteq X_1 \subseteq X_2 \subseteq \dots$

None of these is a substructure of ${M}$, because each ${X_n}$ by relies on witnesses in ${X_{n+1}}$. So we instead take the union:

$\displaystyle X = \bigcup_n X_n.$

This satisfies the Tarski-Vaught test, and is countable.

There is one minor caveat: ${X}$ might not be transitive. We don’t care, because we just take its Mostowski collapse. $\Box$

Please take a moment to admire how insane this is. It hinges irrevocably on the fact that there are countably many sentences we can write down.

Remark 23 This proof relies heavily on the Axiom of Choice when we add in the element ${b}$ to ${X_{n+1}}$. Without Choice, there is no way of making these decisions all at once.

Usually, the right way to formalize the Axiom of Choice usage is, for every formula ${\phi(x, x_1, \dots, x_n)}$, to pre-commit (at the very beginning) to a function ${f_\phi(x_1, \dots, x_n)}$, such that given any ${b_1, \dots, b_n}$ ${f_\phi(b_1, \dots, b_n)}$ will spit out the suitable value of ${b}$ (if one exists). Personally, I think this is hiding the spirit of the proof, but it does make it clear how exactly Choice is being used.

These ${f_\phi}$‘s have a name: Skolem functions.

The trick we used in the proof works in more general settings:

Theorem 24 (Downward Löwenheim-Skolem Theorem) Let ${\mathscr M = (M,E)}$ be a model, and ${A \subseteq M}$. Then there exists a set ${B}$ (called the Skolem hull of ${A}$) with ${A \subseteq B \subseteq M}$, such that ${(B,E) \prec \mathscr M}$, and

$\displaystyle \left\lvert B \right\rvert < \max \left\{ \omega, \left\lvert A \right\rvert \right\}.$

In our case, what we did was simply take ${A}$ to be the empty set.

Ques 25 Prove this. (Exactly the same proof as before.)

## 8. FAQ’s on Countable Models

The most common one is “how is this possible?”, with runner-up “what just happened”.

Let me do my best to answer the first question. It seems like there are two things running up against each other:

1. ${M}$ is a transitive model of ${\mathsf{ZFC}}$, but its universe is uncountable.
2. ${\mathsf{ZFC}}$ tells us there are uncountable sets!

(This has confused so many people it has a name, Skolem’s paradox.)

The reason this works I actually pointed out earlier: countability is not absolute, it is a ${\Sigma_1}$ notion.

Recall that a set ${x}$ is countable if there exists an injective map ${x \hookrightarrow \omega}$. The first statement just says that in the universe ${V}$, there is a injective map ${F: M \hookrightarrow \omega}$. In particular, for any ${x \in M}$ (hence ${x \subseteq M}$, since ${M}$ is transitive), ${x}$ is countable in ${V}$. This is the content of the first statement.

But for ${M}$ to be a model of ${\mathsf{ZFC}}$, ${M}$ only has to think statements in ${\mathsf{ZFC}}$ are true. More to the point, the fact that ${\mathsf{ZFC}}$ tells us there are uncountable sets means

$\displaystyle M \vDash \exists x \text{ uncountable}.$

In other words,

$\displaystyle M \vDash \exists x \forall f \text{ If } f : x \rightarrow \omega \text{ then } f \text{ isn't injective}.$

The key point is the ${\forall f}$ searches only functions in our tiny model ${M}$. It is true that in the “real world” ${V}$, there are injective functions ${f : x \rightarrow \omega}$. But ${M}$ has no idea they exist! It is a brain in a vat: ${M}$ is oblivious to any information outside it.

So in fact, every ordinal which appears in ${M}$ is countable in the real world. It is just not countable in ${M}$. Since ${M \vDash \mathsf{ZFC}}$, ${M}$ is going to think there is some smallest uncountable cardinal, say ${\aleph_1^M}$. It will be the smallest (infinite) ordinal in ${M}$ with the property that there is no bijection in the model ${M}$ between ${\aleph_1^M}$ and ${\omega}$. However, we necessarily know that such a bijection is going to exist in the real world ${V}$.

Put another way, cardinalities in ${M}$ can look vastly different from those in the real world, because cardinality is measured by bijections, which I guess is inevitable, but leads to chaos.

## 9. Picturing Inner Models

Here is a picture of a countable transitive model ${M}$.

Note that ${M}$ and ${V}$ must agree on finite sets, since every finite set has a formula that can express it. However, past ${V_\omega}$ the model and the true universe start to diverge.

The entire model ${M}$ is countable, so it only occupies a small portion of the universe, below the first uncountable cardinal ${\aleph_1^V}$ (where the superscript means “of the true universe ${V}$”). The ordinals in ${M}$ are precisely the ordinals of ${V}$ which happen to live inside the model, because the sentence “${\alpha}$ is an ordinal” is absolute. On the other hand, ${M}$ has only a portion of these ordinals, since it is only a lowly set, and a countable set at that. To denote the ordinals of ${M}$, we write ${\mathrm{On}^M}$, where the superscript means “the ordinals as computed in ${M}$”. Similarly, ${\mathrm{On}^V}$ will now denote the “set of true ordinals”.

Nonetheless, the model ${M}$ has its own version of the first uncountable cardinal ${\aleph_1^M}$. In the true universe, ${\aleph_1^M}$ is countable (below ${\aleph_1^V}$), but the necessary bijection witnessing this might not be inside ${M}$. That’s why ${M}$ can think ${\aleph_1^M}$ is uncountable, even if it is a countable cardinal in the original universe.

So our model ${M}$ is a brain in a vat. It happens to believe all the axioms of ${\mathsf{ZFC}}$, and so every statement that is true in ${M}$ could conceivably be true in ${V}$ as well. But ${M}$ can’t see the universe around it; it has no idea that what it believes is the uncountable ${\aleph_1^M}$ is really just an ordinary countable cardinal.

## 10. Exercises

Problem 1 Show that for any transitive model ${M}$, the set of ordinals in ${M}$ is itself some ordinal.

Problem 2 Assume ${\mathscr M_1 \subseteq \mathscr M_2}$. Show that

1. If ${\phi}$ is ${\Delta_0}$, then ${\mathscr M_1 \vDash \phi[b_1, \dots, b_n] \iff \mathscr M_2 \vDash \phi[b_1, \dots, b_n]}$.
2. If ${\phi}$ is ${\Sigma_1}$, then ${\mathscr M_1 \vDash \phi[b_1, \dots, b_n] \implies \mathscr M_2 \vDash \phi[b_1, \dots, b_n]}$.
3. If ${\phi}$ is ${\Pi_1}$, then ${\mathscr M_2 \vDash \phi[b_1, \dots, b_n] \implies \mathscr M_1 \vDash \phi[b_1, \dots, b_n]}$.

Problem 3 (Reflection) Let ${\kappa}$ be an inaccessible cardinal such that ${|V_\alpha| < \kappa}$ for all ${\alpha < \kappa}$. Prove that for any ${\delta < \kappa}$ there exists ${\delta < \alpha < \kappa}$ such that ${V_\alpha \prec V_\kappa}$; in other words, the set of ${\alpha}$ such that ${V_\alpha \prec V_\kappa}$ is unbounded in ${\kappa}$. This means that properties of ${V_\kappa}$ reflect down to properties of ${V_\alpha}$.

Problem 4 (Inaccessible Cardinal Produce Models) Let ${\kappa}$ be an inaccessible cardinal. Prove that ${V_\kappa}$ is a model of ${\mathsf{ZFC}}$.

# Cauchy’s Functional Equation and Zorn’s Lemma

This is a draft of an appendix chapter for my Napkin project.

In the world of olympiad math, there’s a famous functional equation that goes as follows:

$\displaystyle f : {\mathbb R} \rightarrow {\mathbb R} \qquad f(x+y) = f(x) + f(y).$

Everyone knows what its solutions are! There’s an obvious family of solutions ${f(x) = cx}$. Then there’s also this family of… uh… noncontinuous solutions (mumble grumble) pathological (mumble mumble) Axiom of Choice (grumble).

Don’t worry, I know what I’m doing!

There’s also this thing called Zorn’s Lemma. It sounds terrifying, because it’s equivalent to the Axiom of Choice, which is also terrifying because why not.

In this post I will try to de-terrify these things, because they’re really not terrifying and I’m not sure why no one bothered to explain this properly yet. I have yet to see an olympiad handout that explains how you would construct a pathological solution, even though it’s really quite natural. So let me fix this problem now…

1.Let’s Construct a Monster

Let’s try to construct a “bad” function ${f}$ and see what happens.

By scaling, let’s assume WLOG that ${f(1) = 1}$. Thus ${f(n) = n}$ for every integer ${n}$, and you can easily show from here that

$\displaystyle f\left( \frac mn \right) = \frac mn.$

So ${f}$ is determined for all rationals. And then you get stuck.

None of this is useful for determining, say, ${f(\sqrt 2)}$. You could add and subtract rational numbers all day and, say, ${\sqrt 2}$ isn’t going to show up at all.

Well, we’re trying to set things on fire anyways, so let’s set

$\displaystyle f(\sqrt 2) = 2015$

because why not? By the same induction, we get ${f(n\sqrt2) = 2015n}$, and then that

$\displaystyle f\left( a + b \sqrt 2 \right) = a + 2015b.$

Here ${a}$ and ${b}$ are rationals. Well, so far so good — as written, this is a perfectly good solution, other than the fact that we’ve only defined ${f}$ on a tiny portion of the real numbers.

Well, we can do this all day:

$\displaystyle f\left( a + b \sqrt 2 + c \sqrt 3 + d \pi \right) = a + 2015b + 1337c - 999d.$

Perfectly consistent.

You can kind of see how we should keep going now. Just keep throwing in new real numbers which are “independent” to the previous few, assigning them to whatever junk we want. It feels like it should be workable. . .

In a moment I’ll explain what “independent” means (though you might be able to guess already), but at the moment there’s a bigger issue: no matter how many numbers we throw, it seems like we’ll never finish. Let’s address the second issue first.

2. Review of Finite Induction

When you do induction, you get to count off ${1}$, ${2}$, ${3}$, … and so on. So for example, suppose we had a “problem” such as the following: Prove that the intersection of ${n}$ open intervals is either ${\varnothing}$ or an open interval. You can do this by induction easily: it’s true for ${n = 2}$, and for the larger cases it’s similarly easy.

But you can’t conclude from this that infinitely many open intervals intersect at some open interval. Indeed, this is false: consider the intervals

$\displaystyle \left( -1, 1 \right), \quad \left( -\frac12, \frac12 \right), \quad \left( -\frac13, \frac13 \right), \quad \left( -\frac14, \frac14 \right), \quad \dots$

This infinite set of intervals intersects at a single point ${\{0\}}$!

The moral of the story is that induction doesn’t let us reach infinity. Too bad, because we’d have loved to use induction to help us construct a monster. That’s what we’re doing, after all — adding things in one by one.

3. Transfinite Induction

Well, it turns out we can, but we need a new notion of number.

For this we need a notion of an ordinal number. I defined these in their full glory a previous blog post, but I don’t need the full details of that. Here’s what I want to say: after all the natural numbers

$\displaystyle 0, \; 1, \; \dots,$

I’ll put a new number called ${\omega}$, representing how large the natural numbers are. After that there’s a number called

$\displaystyle \omega+1, \; \omega+2, \; \dots$

and eventually a number called ${2\omega}$.

The list goes on:

\displaystyle \begin{aligned} 0, & 1, 2, 3, \dots, \omega \\ & \omega+1, \omega+2, \dots, \omega+\omega \\ & 2\omega+1, 2\omega+2, \dots, 3\omega \\ & \vdots \\ & \omega^2 + 1, \omega^2+2, \dots \\ & \vdots \\ & \omega^3, \dots, \omega^4, \dots, \omega^\omega \\ & \vdots, \\ & \omega^{\omega^{\omega^{\dots}}} \\ \end{aligned}

Pictorially, it kind of looks like this:

Anyways, in the same way that natural numbers dominate all finite sets, the ordinals dominate all the sets.

Theorem 1 For every set ${S}$ there’s some ordinal ${\alpha}$ which is bigger than it.

But it turns out (and you can intuitively see) that as large as the ordinals grow, there is no infinite descending chain. Meaning: if I start at an ordinal (like ${2 \omega + 4}$) and jump down, I can only take finitely many jumps before I hit ${0}$. (To see this, try writing down a chain starting at ${2 \omega + 4}$ yourself.) Hence, induction and recursion still work verbatim:

Theorem 2 Given a statement ${P(-)}$, suppose that

• ${P(0)}$ is true, and
• If ${P(\alpha)}$ is true for all ${\alpha < \beta}$, then ${P(\beta)}$ is true.

Then ${P(\beta)}$ is true.

Similarly, you’re allowed to do recursion to define ${x_\beta}$ if you know the value of ${x_\alpha}$ for all ${\alpha < \beta}$.

The difference from normal induction or recursion is that we’ll often only do things like “define ${x_{n+1} = \dots}$”. But this is not enough to define ${x_\alpha}$ for all ${\alpha}$. To see this, try using our normal induction and see how far we can climb up the ladder.

Answer: you can’t get ${\omega}$! It’s not of the form ${n+1}$ for any of our natural numbers ${n}$ — our finite induction only lets us get up to the ordinals less than ${\omega}$. Similarly, the simple ${+1}$ doesn’t let us hit the ordinal ${2\omega}$, even if we already have ${\omega+n}$ for all ${n}$. Such ordinals are called limit ordinals. The ordinal that are of the form ${\alpha+1}$ are called successor ordinals.

So a transfinite induction or recursion is very often broken up into three cases. In the induction phrasing, it looks like

• (Zero Case) First, resolve ${P(0)}$.
• (Successor Case) Show that from ${P(\alpha)}$ we can get ${P(\alpha+1)}$.
• (Limit Case) Show that ${P(\lambda)}$ holds given ${P(\alpha)}$ for all ${\alpha < \lambda}$, where ${\lambda}$ is a limit ordinal.

Similarly, transfinite recursion often is split into cases too.

• (Zero Case) First, define ${x_0}$.
• (Successor Case) Define ${x_{\alpha+1}}$ from ${x_\alpha}$.
• (Limit Case) Define ${x_\lambda}$ from ${x_\alpha}$ for all ${\alpha < \lambda}$, where ${\lambda}$ is a limit ordinal.

In both situations, finite induction only does the first two cases, but if we’re able to do the third case we can climb far above the barrier ${\omega}$.

4. Wrapping Up Functional Equations

Let ${S_n}$ denote the set of “base” numbers we have at the ${n}$ the step. In our example, we might have

$\displaystyle S_1 = \left\{ 1 \right\}, \quad S_2 = \left\{ 1, \sqrt 2 \right\}, \quad S_3 = \left\{ 1, \sqrt 2, \sqrt 3 \right\}, \quad S_4 = \left\{ 1, \sqrt 2, \sqrt 3, \pi \right\}, \quad \dots$

and we’d like to keep building up ${S_i}$ until we can express all real numbers. For completeness, let me declare ${S_0 = \varnothing}$.

First, I need to be more precise about “independent”. Intuitively, this construction is working because

$\displaystyle a + b \sqrt 2 + c \sqrt 3 + d \pi$

is never going to equal zero for rational numbers ${a}$, ${b}$, ${c}$, ${d}$ (other than all zeros). In general, a set ${X}$ of numbers is “independent” if the combination

$\displaystyle c_1x_1 + c_2x_2 + \dots + c_mx_m = 0$

never occurs for rational numbers ${{\mathbb Q}}$ unless ${c_1 = c_2 = \dots = c_m = 0}$. Here ${x_i \in X}$ are distinct. Note that even if ${X}$ is infinite, I can only take finite sums! (This notion has a name: we want ${X}$ to be linearly independent over ${{\mathbb Q}}$.)

When do we stop? We’d like to stop when we have a set ${S_{\text{something}}}$ that’s so big, every real number can be written in terms of the independent numbers. (This notion also has a name: it’s called a ${{\mathbb Q}}$-basis.) Let’s call such a set spanning; we stop once we hit a spanning set.

The idea that we can induct still seems okay: suppose ${S_\alpha}$ isn’t spanning. Then there’s some number that is independent of ${S_\alpha}$, say ${\sqrt{2015}\pi}$ or something. Then we just add it to get ${S_{\alpha+1}}$. And we keep going.

Unfortunately, as I said before it’s not enough to be able to go from ${S_\alpha}$ to ${S_{\alpha+1}}$ (successor case); we need to handle the limit case as well. But it turns out there’s a trick we can do. Suppose we’ve constructed all the sets ${S_0}$, ${S_1}$, ${S_2}$, …, one for each positive integer ${n}$, and none of them are spanning. The next thing I want to construct is ${S_\omega}$; somehow I have to “jump”. To do this, I now take the infinite union

$\displaystyle S_\omega \overset{\text{def}}{=} S_0 \cup S_1 \cup S_2 \cup \dots.$

The elements of this set are also independent (why?).

Ta-da! With the simple trick of “union all the existing sets”, we’ve just jumped the hurdle to the first limit ordinal ${\omega}$. Then we can construct ${S_{\omega+1}}$, ${S_{\omega+2}}$, …, and for the next limit we just do the same trick of “union-ing” all the previous sets.

So we can formalize the process as follows:

1. Let ${S_0 = \varnothing}$.
2. For a successor stage ${S_{\alpha+1}}$, add any element to ${S_\alpha}$ to obtain ${S_{\alpha+1}}$.
3. For a limit stage ${S_{\lambda}}$, take the union ${\bigcup_{\gamma < \lambda} S_\gamma}$.

How do we know that we’ll stop eventually? Well, the thing is that this process consumes a lot of real numbers. In particular, the ordinals get larger than the size of ${{\mathbb R}}$. Hence if we don’t stop we will quite literally reach a point where we have used up every single real number. Clearly that’s impossible, because by then the elements can’t possibly be independent! (EDIT Dec 20 2015: To be clear, the claim that “ordinals get larger than the size of the reals” requires the Axiom of Choice; one can’t do this construction using transfinite induction alone. Thanks reddit for calling me out on this.)

So by transfinite recursion (and Choice), we eventually hit some ${S_\gamma}$ which is spanning: the elements are all independent, but every real number can be expressed using it. Done! This set has a name: a Hamel basis.

5. Zorn’s Lemma

Now I can tell you what Zorn’s Lemma is: it lets us do the same thing in any poset.

We can think of the above example as follows: consider all sets of independent elements. These form a partially ordered set by inclusion, and what we did was quite literally climb up a chain

$\displaystyle S_0 \subset S_1 \subset S_2 \subset \dots.$

It’s not quite climbing since we weren’t just going one step at a time: we had to do “jumps” to get up to ${S_\omega}$ and resume climbing. But the main idea is to climb up a poset until we’re at the very top; in the previous case, when we reached the spanning set.

The same thing works verbatim with any partially ordered set ${\mathcal P}$. Let’s define some terminology. A local maximum (or maximal element) of the entire poset ${\mathcal P}$ is an element which has no other elements strictly greater than it.

Now a chain of length ${\gamma}$ is a set of elements ${p_\alpha}$ for every ${\alpha < \gamma}$ such that ${p_0 < p_1 < p_2 < \dots}$. (Observe that a chain has a last element if and only if ${\gamma}$ is a successor ordinal, like ${\omega+3}$.) An upper bound to a chain is an element ${\tilde p}$ which is greater than or equal to all elements of the chain. In particular, if ${\gamma}$ is a successor ordinal, then just taking the last element of the chain works.

In this language, Zorn’s Lemma states that

Theorem 3 (Zorn’s Lemma) Let ${\mathcal P}$ be a nonempty partially ordered set. If every chain has an upper bound, then ${\mathcal P}$ has a local maximum.

Chains with length equal to a successor ordinal always have upper bounds, but this is not true in the limit case. So the hypothesis of Zorn’s Lemma is exactly what lets us “jump” up to define ${p_\omega}$ and other limit ordinals. And the proof of Zorn’s Lemma is straightforward: keep climbing up the poset at successor stages, using Zorn’s condition to jump up at limit stages, and thus building a really long chain. But we have to eventually stop, or we literally run out of elements of ${\mathcal P}$. And the only possible stopping point is a local maximum.

If we want to phrase our previous solution in terms of Zorn’s Lemma, we’d say: Proof: Look at the poset whose elements are sets of independent real numbers. Every chain ${S_0 \subset S_1 \subset \dots}$ has an upper bound ${\bigcup S_\alpha}$ (which you have to check is actually an element of the poset). Thus by Zorn, there is a local maximum ${S}$. Then ${S}$ must be spanning, because otherwise we could add an element to it. $\Box$

So really, Zorn’s Lemma is encoding all of the work of climbing that I argued earlier. It’s a neat little package that captures all the boilerplate, and tells you exactly what you need to check.

One last thing you might ask: where is the Axiom of Choice used? Well, the idea is that for any chain there could be lots of ${\tilde p}$‘s, and you need to pick one of them. Since you are making arbitrary choices infinitely many times, you need the Axiom of Choice. But really, it’s nothing special. [EDIT: AM points out that in order to talk about cardinalities in Theorem 1, one also needs the Axiom of Choice.]

6. Conclusion

In the words of Timothy Gowers,

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

Really, there’s nothing tricky at all here. People seem scared of Zorn’s Lemma, and claim it’s not intuitive or something. But really, all we’re doing is climbing up a poset. Nothing tricky at all.

# Set Theory, Part 2: Constructing the Ordinals

This is a continuation of my earlier set theory post. In this post, I’ll describe the next three axioms of ZF and construct the ordinal numbers.

1. The Previous Axioms

As review, here are the natural descriptions of the five axioms we covered in the previous post.

Axiom 1 (Extensionality) Two sets are equal if they have the same elements.

Axiom 2 (Empty Set Exists) There exists an empty set ${\varnothing}$ which contains no elements

Axiom 3 (Pairing) Given two elements ${x}$ and ${y}$, there exists a set ${\{x,y\}}$ containing only those two elements. (It is permissible to have ${x=y}$, meaning that if ${x}$ is a set then so is ${\{x\}}$.)

Axiom 4 (Union) Given a set ${a}$, we can create ${\cup a}$, the union of the elements of ${a}$. For example, if ${a = \{ \{1,2\}, \{3,4\} \}}$, then ${z = \{1,2,3,4\}}$ is a set.

Axiom 5 (Power Set) Given any set ${x}$, the power set ${\mathcal P(x)}$ is a set.

I’ll comment briefly on what these let us do now. Let ${V_0 = \varnothing}$, and recursively define ${V_{n+1} = \mathcal P(V_n)}$. So for example,

\displaystyle \begin{aligned} V_0 &= \varnothing \\ V_1 &= \{\varnothing\} \\ V_2 &= \{ \varnothing, \{\varnothing\} \} \\ V_3 &= \Big\{ \varnothing, \{\varnothing\}, \{\{\varnothing\}\}, \big\{\varnothing, \{\varnothing\} \big\}\Big\} \\ &\phantom=\vdots \end{aligned}

Now let’s drop the formalism for a moment and go on a brief philosophical musing. Suppose we have a universe ${V_\omega}$ (I’ll explain later what ${\omega}$ is) where the only sets are those which appear in some ${V_n}$. You might then see, in fact, that the sets in ${V_\omega}$ actually obey all five axioms above. What we’ve done is provide a model for which the five axioms are consistent.

But this is a pretty boring model right now for the following reason: even though there are infinitely many sets, there are no infinite sets. In a moment I’ll tell you how we can add new axioms to make infinite sets. But first let me tell you how we construct the natural numbers.

2. The Axiom of Foundation

We’re about to wade into the territory of the infinite, so first I need an axiom to protect us from really bad stuff from happening. What I’m going to do is forbid infinite ${\in}$ chains.

Axiom 6 (Foundation) Loosely, there is no infinite chain of sets

$\displaystyle x_0 \ni x_1 \ni x_2 \ni \dots.$

You can see why this seems reasonable: if I take a random set, I can hop into one of its elements. That’s itself a set, so I can jump into that guy and keep going down. In the finite universe ${V_\omega}$, you can see that eventually I’ll hit ${\varnothing}$, the very bottom of the universe. I want the same to still be true even if my sets are infinite.

This isn’t the actual statement of the axiom. The way to say this in machine code is that for any nonempty set ${x}$, there exists a ${y \in x}$ such that ${z \notin y}$ for any ${z \in x}$. We can’t actually write about something like ${x_0 \ni x_1 \ni \dots}$ in machine code (yet). Nevertheless this suffices for our axioms.

There’s an important consequence of this.

Theorem 1 ${x \notin x}$ for any set ${x}$.

Proof: For otherwise we would have ${x \ni x \ni x \ni \dots}$ which violates Foundation. $\Box$

3. The Natural Numbers

Note: in set theory, ${0}$ is considered a natural number.

Now for the fun part. If we want to encode math statements into the language of set theory, the first thing we’d want to do is encode the numbers ${0}$, ${1}$, ${\dots}$ in there so that we can actually do arithmetic. How might we do that?

What we’re going to do is construct a sequence of sets of sizes ${0}$, ${1}$, ${\dots}$ and let these correspond to the natural numbers. What sets should we choose? Well, there’s only one set of size ${0}$, so we begin by writing

$\displaystyle 0 \overset{\text{def}}{=} \varnothing.$

I’ll give away a little more than I should and then write

$\displaystyle 1 \overset{\text{def}}{=} \{\varnothing\}, \quad 2 \overset{\text{def}}{=} \{\varnothing, \{\varnothing\} \}.$

Now let’s think about ${3}$. If we want to construct a three-element set and we already have a two-element set, then we just need to add another element to ${1}$ that’s not already in there. In other words, to construct ${3}$ I just need to pick an ${x}$ such that ${x \notin 2}$, then let ${3 = \{x\} \cup 2}$. (Or in terms of our axioms, ${3 = \cup \left\{ 2, \{x\} \right\}}$.) Now what’s an easy way to pick ${x}$ such that ${x \notin 2}$? Answer: pick ${x=2}$. By the earlier theorem, we always ${2 \notin 2}$.

Now the cat’s out of the bag! We define

\displaystyle \begin{aligned} 0 &= \varnothing \\ 1 &= \left\{ 0 \right\} \\ 2 &= \left\{ 0, 1 \right\} \\ 3 &= \left\{ 0, 1, 2 \right\} \\ 4 &= \left\{ 0, 1, 2, 3 \right\} \\ &\phantom= \vdots \end{aligned}

And there you have it: the nonnegative integers. You can have some fun with this definition and write things like

$\displaystyle \left\{ x \in 8 \mid x \text{ is even} \right\} = \left\{ 0, 2, 4, 6 \right\}$

now. Deep down, everything’s a set.

4. Finite Ordinals

We’re currently able to do some counting now, because we’ve defined the sequence of sets

$\displaystyle 0, 1, 2, \dots$

by ${0 = \varnothing}$ and ${n+1 = \{0,\dots,n\}}$. This sequence is related by

$\displaystyle 0 \in 1 \in 2 \in \dots.$

Some properties of these “numbers” I’ve made are:

• They are well-ordered by ${\in}$ (which corresponds exactly with the ${<}$ which we're familiar with; that's a good motivation for choosing this construction, as the well-ordering property is one of the most important properties of ${\mathbb N}$, and using ${\in}$ for this purpose lets us do this ordering painlessly). That means if I take the elements of ${n}$, then I can sort them in a transitive chain like I've done above: for any ${x}$ and ${y}$, either ${x \in y}$ or ${y \in x}$. For example, the elements of ${4}$ are ${0}$, ${1}$, ${2}$, ${3}$ and ${0 \in 1 \in 2 \in 3}$. It also means that any subset has a “minimal'' element, which would just be the first element of the chain. Here is the complete definition.
• The set ${n}$ is transitive. What this means that it is a “closed universe” in the sense that if I look at an element ${a}$ of ${n}$, all the elements of ${a}$ are also in ${n}$. For example, if I take the element ${3}$ of ${5 = \{0,1,2,3,4\}}$, all the elements of ${3}$ are in ${5}$. Looking deeply into ${n}$ won’t find me anything I didn’t see at the top level.

In other words, a set ${S}$ is transitive if for every ${T \in S}$, ${T \subseteq S}$.

A set which is both transitive and well-ordered by ${\in}$ is called an ordinal, and the numbers ${0,1,2,\dots}$ are precisely the finite ordinals. But now I’d like to delve into infinite numbers. Can I define some form of “infinity”?

5. Infinite Ordinals

To tell you what a set is, I only have to tell you who its elements are. And so I’m going to define the set

$\displaystyle \omega = \left\{ 0, 1, 2, \dots \right\}.$

And now our counting looks like this.

$\displaystyle 0, 1, 2, \dots, \omega.$

We just tacked on an infinity at the end by scooping up all the natural numbers and collecting them in a set. You can do that? Sure you can! All I’ve done is written down the elements of the set, and you can check that ${\omega}$ is indeed an ordinal.

Well, okay, there’s one caveat. We don’t actually know whether the ${\omega}$ I’ve written down is a set. Pairing and Union lets us collect any finite collection of sets into a single set, but it doesn’t let us collect things into an infinite set. In fact, you can’t prove from the axioms that ${\omega}$ is a set.

For this I’m going to present another two axioms. These are much more technical to describe, so I’ll lie to you about what their statements are. If you’re interested in the exact statements, consult the lecture notes linked at the bottom of this post.

Axiom 7 (Replacement) Loosely, let ${f}$ be a function on a set ${X}$. Then the image of ${f}$ is a set:

$\displaystyle \exists Y : \quad \forall y, \; y \in Y \iff \exists x : f(x) = y.$

Axiom 8 (Infinity) There exists a set ${\omega = \{0,1,2,\dots\}}$.

With these two axioms, we can now write down the first infinite ordinal ${\omega}$. So now our list of ordinals is

$\displaystyle 0, 1, 2, \dots, \omega.$

But now in the same way we constructed ${3}$ from ${2}$, we can construct a set

$\displaystyle \omega + 1 \overset{\text{def}}{=} \omega \cup \{\omega\} = \left\{ 0, 1, \dots, \omega \right\}$

and then

$\displaystyle \omega + 2 \overset{\text{def}}{=} (\omega+1) \cup \{\omega+1\} = \left\{ 0, 1, \dots, \omega, \omega+1 \right\}$

and so on — all of these are also transitive and well-ordered by ${\in}$. So now our list of ordinals is

$\displaystyle 0, 1, 2, \dots, \omega, \omega+1, \omega+2, \dots.$

Well, can we go on further? What we’d like is to define an “${\omega+\omega}$ or ${2 \cdot \omega}$”, which would entail capturing all of the above elements into a set. Well, I claim we can do so. Consider a function ${f}$ defined on ${\omega}$ which sends ${n}$ to ${\omega+n}$. So ${f(0) = \omega}$, ${f(3) = \omega+3}$, ${f(2014) = \omega+2014}$. (Strictly, I have to prove that set-encoding of this function, namely ${\{(0,\omega), (1,\omega+1), \dots\}}$, is actually a set. But that’s put that aside for now.) Then Replacement tells me that I have a set

$\displaystyle \left\{ f(0), f(1), \dots \right\} = \left\{ \omega, \omega+1, \omega+2, \dots \right\}$

From here we can union this set with ${\omega}$ to achieve the set ${\left\{ 0,1,2,\dots,\omega,\omega+1,\omega+2,\dots \right\}}$. And we can keep turning this wheel repeatedly, yielding the ordinal numbers.

\displaystyle \begin{aligned} 0, & 1, 2, 3, \dots, \omega \\ & \omega+1, \omega+2, \dots, \omega+\omega \\ & 2\omega+1, 2\omega+2, \dots, 3\omega \\ & \vdots \\ & \omega^2 + 1, \omega^2+2, \dots \\ & \vdots \\ & \omega^3, \dots, \omega^4, \dots, \omega^\omega \\ & \vdots, \\ & \omega^{\omega^{\omega^{\dots}}} \\ \end{aligned}

I won’t say much more about these ordinal numbers since the post is already getting pretty long, but I’ll mention that the ordinals might not correspond to a type of counting that you’re used to, in the sense that there is a bijection between the sets ${\omega}$ to ${\omega+1}$. It might seem like different numbers should have different “sizes”. For this you stumble into the cardinal numbers: it turns out that a cardinal number is just defined as an ordinal number which is not in bijection with any smaller ordinal number.

6. A Last Few Axioms

I’ll conclude this exposition on ZFC with a few last axioms. First is the axiom called Comprehension, though it actually can be proven from the Replacement axiom.

Axiom 9 (Comprehension) Let ${\phi}$ be some property, and let ${S}$ be a set. Then the notion

$\displaystyle \left\{ x \in S \mid \phi(x) \right\}$

is a set. More formally,

$\displaystyle \exists X \forall x \in X: (x \in S) \land (\phi(x)).$

Notice that the comprehension must be restricted: we can only take subsets of existing sets. From this one can deduce that there is no set of all sets; if we had such a set ${V}$, then we could use Comprehension to generate ${\{x \in V : x \notin x\}}$, recovering Russel’s Paradox.

So anyways, this means that we can take list comprehensions.

Finally, I’ll touch a little on why the Axiom of Choice is actually important. You’ve probably heard the phrasing that “you can pick a sock out of every drawer” or some cute popular math phrasing like that. Here’s the precise statement.

Axiom 10 (Choice) Let ${\mathcal F}$ be a set such that ${\varnothing \notin \mathcal F}$. Then we can define a function ${g}$ on ${\mathcal F}$ such that ${g(y) \in y}$ for every ${y \in \mathcal F}$. The function ${g}$ is called a choice function; given a bunch of sets ${\mathcal F}$, it chooses an element ${g(y)}$ out of every ${y}$. In other words, for any ${\mathcal F}$ with ${\varnothing \notin \mathcal F}$, there exists a set

$\displaystyle \left\{ (y,g(y)) \mid y \in \mathcal F \right\}.$

with ${g(y) \in y}$ for every ${y}$.

In light of the discussion in these posts, what’s significant is not that we can conceive such a function (how hard can it be to take an element out of a nonempty set?) but that the resulting structure is actually a set. The whole point of having the ZF axioms is that we have to be careful about how we are allowed to make new sets out of old ones, so that we don’t run ourselves into paradoxes. The Axiom of Choice reflects that this is a subtle issue.

So there you have it, the axioms of ZFC and what they do.

Thanks to Peter Koellner, Harvard, for his class Math 145a, which taught me this material. My notes for this course can be downloaded from my math website.

Thanks to J Wu for pointing out a typo in Replacement and noting that I should emphasize how ${\in}$ leads to the well-ordering for the ordinals.