Models of ZFC

Model theory is really meta, so you will have to pay attention here.

Roughly, a “model of {\mathsf{ZFC}}” is a set with a binary relation that satisfies the {\mathsf{ZFC}} axioms, just as a group is a set with a binary operation that satisfies the group axioms. Unfortunately, unlike with groups, it is very hard for me to give interesting examples of models, for the simple reason that we are literally trying to model the entire universe.

1. Models

Prototypical example for this section: {(\omega, \in)} obeys {\mathrm{PowerSet}}, {V_\kappa} is a model for {\kappa} inaccessible (later).

Definition 1 A model {\mathscr M} consists of a set {M} and a binary relation {E \subseteq M \times M}. (The {E} relation is the “{\in}” for the model.)

Remark 2 I’m only considering set-sized models where {M} is a set. Experts may be aware that I can actually play with {M} being a class, but that would require too much care for now.

If you have a model, you can ask certain things about it. For example, you can ask “does it satisfy {\mathrm{EmptySet}}?”. Let me give you an example of what I mean, and then make it rigorous.

Example 3 (A Stupid Model) Let’s take {\mathscr M = (M,E) = \left( \omega, \in \right)}. This is not a very good model of {\mathsf{ZFC}}, but let’s see if we can make sense of some of the first few axioms.

  1. {\mathscr M} satisfies {\mathrm{Extensionality}}, which is the sentence

    \displaystyle \forall x \forall y \forall a : \left( a \in x \iff a \in y \right) \implies x = y.

    This just follows from the fact that {E} is actually {\in}.


  2. {\mathscr M} satisfies {\mathrm{EmptySet}}, which is the sentence

    \displaystyle \exists a : \forall x \; \neg (x \in a).

    Namely, take {a = \varnothing \in \omega}.

  3. {\mathscr M} does not satisfy {\mathrm{Pairing}}, since {\{1,3\}} is not in {\omega}, even though {1, 3 \in \omega}
  4. Miraculously, {\mathscr M} satisfies {\mathrm{Union}}, since for any {n \in \omega}, {\cup n} is {n-1} (unless {n=0}). The Union axiom statements that

    \displaystyle \forall a \exists z \quad \forall x \; (x \in z) \iff (\exists y : x \in y \in z).

    An important thing to notice is that the “{\forall a}” ranges only over the sets in the model of the universe, {\mathscr M}.

Example 4 (Important: This Stupid Model Satisfies {\mathrm{PowerSet}}) Most incredibly of all: {\mathscr M = (\omega, \in)} satisfies {\mathrm{PowerSet}}. This is a really important example. You might think this is ridiculous. Look at {2 = \{0,1\}}. The power set of this is {\{0, 1, 2, \{1\}\}} which is not in the model, right?

Well, let’s look more closely at {\mathrm{PowerSet}}. It states that:

\displaystyle \forall x \exists a \forall y (y \in a \iff y \subseteq x).

What happens if we set {x = 2 = \{0,1\}}? Well, actually, we claim that {a = 3 = \{0,1,2\}} works. The key point is “for all {y}” — this only ranges over the objects in {\mathscr M}. In {\mathscr M}, the only subsets of {2} are {0 = \varnothing}, {1 = \{0\}} and {2 = \{0,1\}}. The “set” {\{1\}} in the “real world” (in {V}) is not a set in the model {\mathscr M}.

In particular, you might say that in this strange new world, we have {2^n = n+1}, since {n = \{0,1,\dots,n-1\}} really does have only {n+1} subsets.

Example 5 (Sentences with Parameters) The sentences we ask of our model are allowed to have “parameters” as well. For example, if {\mathscr M = (\omega, \in)} as before then {\mathscr M} satisfies the sentence

\displaystyle \forall x \in 3 (x \in 5).

2. Sentences and Satisfaction

With this intuitive notion, we can define what it means for a model to satisfy a sentence.

Definition 6 Note that any sentence {\phi} can be written in one of the following five forms:

  • {x \in y}
  • {x = y}
  • {\neg \psi} (“not {\psi}”) for some shorter sentence {\psi}
  • {\psi_1 \lor \psi_2} (“`{\psi_1} or {\psi_2}”) for some shorter sentences {\psi_1}, {\psi_1}
  • {\exists x \psi} (“exists {x}”) for some shorter sentence {\psi}.

Ques 7 What happened to {\land} (and) and {\forall} (for all)? (Hint: use {\neg}.)

Often (almost always, actually) we will proceed by so-called “induction on formula complexity”, meaning that we define or prove something by induction using this. Note that we require all formulas to be finite.

Now suppose we have a sentence {\phi}, like {a = b} or {\exists a \forall x \neg (x \in a)}, plus a model {\mathscr M = (M,E)}. We want to ask whether {\mathscr M} satisfies {\phi}.

To give meaning to this, we have to designate certain variables as parameters. For example, if I asked you “Does {a=b}?” the first question you would ask is what {a} and {b} are. So {a}, {b} would be parameters: I have to give them values for this sentence to make sense.

On the other hand, if I asked you “Does {\exists a \forall x \neg (x \in a)}?” then you would just say “yes”. In this case, {x} and {a} are not parameters. In general, parameters are those variables whose meaning is not given by some {\forall} or {\exists}.

In what follows, we will let {\phi(x_1, \dots, x_n)} denote a formula {\phi}, whose parameters are {x_1}, \dots, {x_n}. Note that possibly {n=0}, for example all {\mathsf{ZFC}} axioms have no parameters.

Ques 8 Try to guess the definition of satisfaction before reading it below. (It’s not very hard to guess!)

Definition 9 Let {\mathscr M=(M,E)} be a model. Let {\phi(x_1, \dots, x_n)} be a sentence, and let {b_1, \dots, b_n \in M}. We will define a relation

\displaystyle \mathscr M \vDash \phi[b_1, \dots, b_n]

and say {\mathscr M} satisfies the sentence {\phi} with parameters {b_1, \dots, b_n}.

The relationship is defined by induction on formula complexity as follows:

  • If {\phi} is “{x_1=x_2}” then {\mathscr M \vDash \phi[b_1, b_2] \iff b_1 = b_2}.
  • If {\phi} is “{x_1\in x_2}” then {\mathscr M \vDash \phi[b_1, b_2] \iff b_1 \; E \; b_2}.
    (This is what we mean by “{E} interprets {\in}”.)
  • If {\phi} is “{\neg \psi}” then {\mathscr M \vDash \phi[b_1, \dots, b_n] \iff \mathscr M \not\vDash \phi[b_1, \dots, b_n]}.
  • If {\phi} is “{\psi_1 \lor \psi_2}” then {\mathscr M \vDash \phi[b_1, \dots, b_n]} means {\mathscr M \vDash \psi_i[b_1, \dots, b_n]} for some {i=1,2}.
  • Most important case: suppose {\phi} is {\exists x \psi(x,x_1, \dots, x_n)}. Then {\mathscr M \vDash \phi[b_1, \dots, b_n]} if and only if

    \displaystyle \exists b \in M \text{ such that } \mathscr M \vDash \psi[b, b_1, \dots, b_n].

    Note that {\psi} has one extra parameter.

Notice where the information of the model actually gets used. We only ever use {E} in interpreting {x_1 \in x_2}; unsurprising. But we only ever use the set {M} when we are running over {\exists} (and hence {\forall}). That’s well-worth keeping in mind: The behavior of a model essentially comes from {\exists} and {\forall}, which search through the entire model {M}.

And finally,

Definition 10 A model of {\mathsf{ZFC}} is a model {\mathscr M = (M,E)} satisfying all {\mathsf{ZFC}} axioms.

We are especially interested in models of the form {(M, \in)}, where {M} is a transitive set. (We want our universe to be transitive, otherwise we would have elements of sets which are not themselves in the universe, which is very strange.) Such a model is called a transitive model. If {M} is a transitive set, the model {(M, \in)} will be abbreviated to just {M}.

Definition 11 An inner model of {\mathsf{ZFC}} is a transitive model satisfying {\mathsf{ZFC}}.

3. The Levy Hierarchy

Prototypical example for this section: {\mathtt{isSubset}(x,y)} is absolute. The axiom {\mathrm{EmptySet}} is {\Sigma_1}, {\mathtt{isPowerSetOf}(X,x)} is {\Pi_1}.

A key point to remember is that the behavior of a model is largely determined by {\exists} and {\forall}. It turns out we can say even more than this.

Consider a formula such as

\displaystyle \mathtt{isEmpty}(x) : \neg \exists a (a \in x)

which checks whether a given set {x} has a nonempty element. Technically, this has an “{\exists}” in it. But somehow this {\exists} does not really search over the entire model, because it is bounded to search in {x}. That is, we might informally rewrite this as

\displaystyle \neg (\exists x \in a)

which doesn’t fit into the strict form, but points out that we are only looking over {a \in x}. We call such a quantifier a bounded quantifier.

We like sentences with bounded quantifiers because they designate properties which are absolute over transitive models. It doesn’t matter how strange your surrounding model {M} is. As long as {M} is transitive,

\displaystyle M \vDash \mathtt{isEmpty}(\varnothing)

will always hold. Similarly, the sentence

\displaystyle \mathtt{isSubset}(x,y) : x \subseteq y \text { i.e. } \forall a \in x (a \in y).

Sentences with this property are called {\Sigma_0} or {\Pi_0}.

The situation is different with a sentence like

\displaystyle \mathtt{isPowerSetOf}(y,x) : \forall z \left( z \subseteq x \iff z \in y \right)

which in English means “{y} is the power set of {x}”, or just {y = \mathcal P(x)}. The {\forall z} is not bounded here. This weirdness is what allows things like

\displaystyle \omega \vDash \text{``} \{0,1,2\} \text{ is the power set of }\{0,1\}\text{''}

and hence

\displaystyle \omega \vDash \mathrm{PowerSet}

which was our stupid example earlier. The sentence {\mathtt{isPowerSetOf}} consists of an unbounded {\forall} followed by an absolute sentence, so we say it is {\Pi_1}.

More generally, the Levy hierarchy keeps track of how bounded our quantifiers are. Specifically,

  • Formulas which have only bounded quantifiers are {\Delta_0 = \Sigma_0 = \Pi_0}.
  • Formulas of the form {\exists x_1 \dots \exists x_k \psi} where {\psi} is {\Pi_n} are consider {\Sigma_{n+1}}.
  • Formulas of the form {\forall x_1 \dots \forall x_k \psi} where {\psi} is {\Sigma_n} are consider {\Pi_{n+1}}.

(A formula which is both {\Sigma_n} and {\Pi_n} is called {\Delta_n}, but we won’t use this except for {n=0}.)

Example 12 (Examples of {\Delta_0} Sentences) {\empty}

  1. The sentences {\mathtt{isEmpty}(x)}, {x \subseteq y}, as discussed above.
  2. The formula “{x} is transitive” can be expanded as a {\Delta_0} sentence.
  3. The formula “{x} is an ordinal” can be expanded as a {\Delta_0} sentence.

Exercise 13 Write out the expansions for “{x} is transitive” and “{x} is ordinal” in a {\Delta_0} form.

Example 14 (More Complex Formulas) {\empty}

  1. The axiom {\mathrm{EmptySet}} is {\Sigma_1}; it is {\exists a (\mathtt{isEmpty}(a))}, and {\mathtt{isEmpty}(a)} is {\Delta_0}.
  2. The formula “{y = \mathcal P(x)}” is {\Pi_1}, as discussed above.
  3. The formula “{x} is countable” is {\Sigma_1}. One way to phrase it is “{\exists f} an injective map {x \hookrightarrow \omega}”, which necessarily has an unbounded “{\exists f}”.
  4. The axiom {\mathrm{PowerSet}} is {\Pi_3}:

    \displaystyle \forall y \exists P \forall x (x\subseteq y \iff x \in P).

4. Substructures, and Tarski-Vaught

Let {\mathscr M_1 = (M_1, E_1)} and {\mathscr M_2 = (M_2, E_2)} be models.

Definition 15 We say that {\mathscr M_1 \subseteq \mathscr M_2} if {M_1 \subseteq M_2} and {E_1} agrees with {E_2}; we say {\mathscr M_1} is a substructure of {\mathscr M_2}.

That’s boring. The good part is:

Definition 16 We say {\mathscr M_1 \prec \mathscr M_2}, or {\mathscr M_1} is an elementary substructure of {\mathscr M_2}, if for every sentence {\phi(x_1, \dots, x_n)} and parameters {b_1, \dots, b_n \in M_1}, we have

\displaystyle \mathscr M_1 \vDash \phi[b_1, \dots, b_n] \iff \mathscr M_2 \vDash \phi[b_1, \dots, b_n].

In other words, {\mathscr M_1} and {\mathscr M_2} agree on every sentence possible. Note that the {b_i} have to come from {M_1}; if the {b_i} came from {\mathscr M_2} then asking something of {\mathscr M_1} wouldn’t make sense.

Let’s ask now: how would {\mathscr M_1 \prec \mathscr M_2} fail to be true? If we look at the possibly sentences, none of the atomic formulas, nor the “{\land}” and “{\neg}”, are going to cause issues.

The intuition you should be getting by now is that things go wrong once we hit {\forall} and {\exists}. They won’t go wrong for bounded quantifiers. But unbounded quantifiers search the entire model, and that’s where things go wrong.

To give a “concrete example”: imagine {\mathscr M_1} is MIT, and {\mathscr M_2} is the state of Massachusetts. If {\mathscr M_1} thinks there exist hackers at MIT, certainly there exist hackers in Massachusetts. Where things go wrong is something like:

\displaystyle \mathscr M_2 \vDash \text{``} \exists x : x \text{ is a course numbered }> 50\text{''}.

This is true for {\mathscr M_2} because we can take the witness {x = \text{Math 55}}, say. But it’s false for {\mathscr M_1}, because at MIT all courses are numbered {18.701} or something similar. The issue is that the witness for statements in {\mathscr M_2} do not necessarily propagate up down to witnesses for {\mathscr M_1}, even though they do from {\mathscr M_1} to {\mathscr M_2}.

The Tarski-Vaught test says this is the only impediment: if every witness in {\mathscr M_2} can be replaced by one in {\mathscr M_1} then {\mathscr M_1 \prec \mathscr M_2}.

Lemma 17 (Tarski-Vaught) Let {\mathscr M_1 \subseteq \mathscr M_2}. Then {\mathscr M_1 \prec \mathscr M_2} if and only if for every sentence {\phi(x, x_1, \dots, x_n)} and parameters {b_1, \dots, b_n \in M_1}: if there is a witness {\tilde b \in M_2} to {\mathscr M_2 \vDash \phi(\tilde b, b_1 \dots, b_n)} then there is a witness {b \in M_1} to {\mathscr M_1 \vDash \phi(b, b_1, \dots, b_n)}.

Proof: Easy after the above discussion. To formalize it, use induction on formula complexity. \Box

5. Obtaining the Axioms of {\mathsf{ZFC}}

Extending the above ideas, one can obtain without much difficulty the following. The idea is that almost all the {\mathsf{ZFC}} axioms are just {\Sigma_1} claims about certain desired sets, and so verifying an axiom reduces to checking some appropriate “closure” condition: that the witness to the axiom is actually in the model.

For example, the {\mathrm{EmptySet}} axiom is “{\exists a (\mathtt{isEmpty}(a))}”, and so we’re happy as long as {\varnothing \in M}, which is of course true for any nonempty transitive set {M}.

Lemma 18 (Transitive Sets Inheriting {\mathsf{ZFC}}) Let {M} be a nonempty transitive set. Then

  1. {M} satisfies {\mathrm{Extensionality}}, {\mathrm{Foundation}}, {\mathrm{EmptySet}}.
  2. {M \vDash \mathrm{Pairing}} if {x,y \in M \implies \{x,y\} \in M}.
  3. {M \vDash \mathrm{Union}} if {x \in M \implies \cup x \in M}.
  4. {M \vDash \mathrm{PowerSet}} if {x \in M \implies \mathcal P(x) \cap M \in M}.
  5. {M \vDash \mathrm{Replacement}} if for every {x \in M} and every function {F : x \rightarrow M} which is {M}-definable with parameters, we have {F``x \in M} as well.
  6. {M \vDash \mathrm{Infinity}} as long as {\omega \in M}.

Here, a set {X \subseteq M} is {M}-definable with parameters if it can be realized as

\displaystyle X = \left\{ x \in M \mid \phi[x, b_1, \dots, b_n] \right\}

for some (fixed) choice of parameters {b_1,\dots,b_n \in M}. We allow {n=0}, in which case we say {X} is {M}-definable without parameters. Note that {X} need not itself be in {M}! As a trivial example, {X = M} is {M}-definable without parameters (just take {\phi[x]} to always be true), and certainly we do not have {X \in M}.

Exercise 19 Verify (i)-(iv) above.

Remark 20 Converses to the statements of Lemma 18 are true for all claims other than (vii).

6. Mostowski Collapse

Up until now I have been only talking about transitive models, because they were easier to think about. Here’s a second, better reason we might only care about transitive models.

Lemma 21 (Mostowski Collapse) Let {\mathscr X = (X,E)} be a model such that {\mathscr X \vDash \mathrm{Extensionality} + \mathrm{Foundation}}. Then there exists an isomorphism {\pi : \mathscr X \rightarrow M} for a transitive model {M = (M,\in)}.

This is also called the transitive collapse. In fact, both {\pi} and {M} are unique.

Proof: The idea behind the proof is very simple. Since {E} is well-founded and extensional, we can look at the {E}-minimal element {x_\varnothing} of {X} with respect to {E}. Clearly, we want to send that to {0 = \varnothing}.

Then we take the next-smallest set under {E}, and send it to {1 = \{\varnothing\}}. We “keep doing this”; it’s not hard to see this does exactly what we want.

To formalize, define {\pi} by transfinite recursion:

\displaystyle \pi(x) \overset{\mathrm{def}}{=} \left\{ \pi(y) \mid y \; E \; x \right\}.

This {\pi}, by construction, does the trick. \Box

The picture of this is quite “collapsing” the elements of {M} down to the bottom of {V}, hence the name.

7. Adding an Inaccessible, Skolem Hulls, and Going Insane

Prototypical example for this section: {V_\kappa}

At this point you might be asking, well, where’s my model of {\mathsf{ZFC}}?

I unfortunately have to admit now: {\mathsf{ZFC}} can never prove that there is a model of {\mathsf{ZFC}} (unless {\mathsf{ZFC}} is inconsistent, but that would be even worse). This is a result called Gödel’s Incompleteness Theorem.

Nonetheless, with some very modest assumptions added, we can actually show that a model does exist: for example, assuming that there exists a strongly inaccessible cardinal {\kappa} would do the trick, it turns out {V_\kappa} will be such a model. Intuitively you can see why: {\kappa} is so big that any set of rank lower than it can’t escape it even if we take their power sets, or any other method that {\mathsf{ZFC}} lets us do.

More pessimistically, this shows that it’s impossible to prove in {\mathsf{ZFC}} that such a {\kappa} exists. Nonetheless, we now proceed under {\mathsf{ZFC}^+} for convenience, which adds the existence of such a {\kappa} as a final axiom. So we now have a model {V_\kappa} to play with. Joy!

Great. Now we do something really crazy.

Theorem 22 (Countable Transitive Model) Assume {\mathsf{ZFC}^+}. Then there exists a transitive model {M} of {\mathsf{ZFC}} such that {M} is a countable set.

Proof: Fasten your seat belts.

Start with the set {X_0 = \varnothing}. Then for every integer {n}, we do the following to get {X_{n+1}}.

  • Start with {X_{n+1}} containing very element of {X_n}.
  • Consider a formula {\phi(x, x_1, \dots, x_n)} and {b_1, \dots, b_n} in {X_n}. Suppose that {M} thinks there is an {b \in M} for which

    \displaystyle M \vDash \phi[b, b_1, \dots, b_n].

    We then add in the element {b} to {X_{n+1}}.

  • We do this for every possible formula in the language of set theory. We also have to put in every possible set of parameters from the previous set {X_n}.

At every step {X_n} is countable. Reason: there are countably many possible finite sets of parameters in {X_n}, and countably many possible formulas, so in total we only ever add in countably many things at each step. This exhibits an infinite nested sequence of countable sets

\displaystyle X_0 \subseteq X_1 \subseteq X_2 \subseteq \dots

None of these is a substructure of {M}, because each {X_n} by relies on witnesses in {X_{n+1}}. So we instead take the union:

\displaystyle X = \bigcup_n X_n.

This satisfies the Tarski-Vaught test, and is countable.

There is one minor caveat: {X} might not be transitive. We don’t care, because we just take its Mostowski collapse. \Box

Please take a moment to admire how insane this is. It hinges irrevocably on the fact that there are countably many sentences we can write down.

Remark 23 This proof relies heavily on the Axiom of Choice when we add in the element {b} to {X_{n+1}}. Without Choice, there is no way of making these decisions all at once.

Usually, the right way to formalize the Axiom of Choice usage is, for every formula {\phi(x, x_1, \dots, x_n)}, to pre-commit (at the very beginning) to a function {f_\phi(x_1, \dots, x_n)}, such that given any {b_1, \dots, b_n} {f_\phi(b_1, \dots, b_n)} will spit out the suitable value of {b} (if one exists). Personally, I think this is hiding the spirit of the proof, but it does make it clear how exactly Choice is being used.

These {f_\phi}‘s have a name: Skolem functions.

The trick we used in the proof works in more general settings:

Theorem 24 (Downward Löwenheim-Skolem Theorem) Let {\mathscr M = (M,E)} be a model, and {A \subseteq M}. Then there exists a set {B} (called the Skolem hull of {A}) with {A \subseteq B \subseteq M}, such that {(B,E) \prec \mathscr M}, and

\displaystyle \left\lvert B \right\rvert < \max \left\{ \omega, \left\lvert A \right\rvert \right\}.

In our case, what we did was simply take {A} to be the empty set.

Ques 25 Prove this. (Exactly the same proof as before.)

8. FAQ’s on Countable Models

The most common one is “how is this possible?”, with runner-up “what just happened”.

Let me do my best to answer the first question. It seems like there are two things running up against each other:

  1. {M} is a transitive model of {\mathsf{ZFC}}, but its universe is uncountable.
  2. {\mathsf{ZFC}} tells us there are uncountable sets!

(This has confused so many people it has a name, Skolem’s paradox.)

The reason this works I actually pointed out earlier: countability is not absolute, it is a {\Sigma_1} notion.

Recall that a set {x} is countable if there exists an injective map {x \hookrightarrow \omega}. The first statement just says that in the universe {V}, there is a injective map {F: M \hookrightarrow \omega}. In particular, for any {x \in M} (hence {x \subseteq M}, since {M} is transitive), {x} is countable in {V}. This is the content of the first statement.

But for {M} to be a model of {\mathsf{ZFC}}, {M} only has to think statements in {\mathsf{ZFC}} are true. More to the point, the fact that {\mathsf{ZFC}} tells us there are uncountable sets means

\displaystyle M \vDash \exists x \text{ uncountable}.

In other words,

\displaystyle M \vDash \exists x \forall f \text{ If } f : x \rightarrow \omega \text{ then } f \text{ isn't injective}.

The key point is the {\forall f} searches only functions in our tiny model {M}. It is true that in the “real world” {V}, there are injective functions {f : x \rightarrow \omega}. But {M} has no idea they exist! It is a brain in a vat: {M} is oblivious to any information outside it.

So in fact, every ordinal which appears in {M} is countable in the real world. It is just not countable in {M}. Since {M \vDash \mathsf{ZFC}}, {M} is going to think there is some smallest uncountable cardinal, say {\aleph_1^M}. It will be the smallest (infinite) ordinal in {M} with the property that there is no bijection in the model {M} between {\aleph_1^M} and {\omega}. However, we necessarily know that such a bijection is going to exist in the real world {V}.

Put another way, cardinalities in {M} can look vastly different from those in the real world, because cardinality is measured by bijections, which I guess is inevitable, but leads to chaos.

9. Picturing Inner Models

Here is a picture of a countable transitive model {M}.


Note that {M} and {V} must agree on finite sets, since every finite set has a formula that can express it. However, past {V_\omega} the model and the true universe start to diverge.

The entire model {M} is countable, so it only occupies a small portion of the universe, below the first uncountable cardinal {\aleph_1^V} (where the superscript means “of the true universe {V}”). The ordinals in {M} are precisely the ordinals of {V} which happen to live inside the model, because the sentence “{\alpha} is an ordinal” is absolute. On the other hand, {M} has only a portion of these ordinals, since it is only a lowly set, and a countable set at that. To denote the ordinals of {M}, we write {\mathrm{On}^M}, where the superscript means “the ordinals as computed in {M}”. Similarly, {\mathrm{On}^V} will now denote the “set of true ordinals”.

Nonetheless, the model {M} has its own version of the first uncountable cardinal {\aleph_1^M}. In the true universe, {\aleph_1^M} is countable (below {\aleph_1^V}), but the necessary bijection witnessing this might not be inside {M}. That’s why {M} can think {\aleph_1^M} is uncountable, even if it is a countable cardinal in the original universe.

So our model {M} is a brain in a vat. It happens to believe all the axioms of {\mathsf{ZFC}}, and so every statement that is true in {M} could conceivably be true in {V} as well. But {M} can’t see the universe around it; it has no idea that what it believes is the uncountable {\aleph_1^M} is really just an ordinary countable cardinal.

10. Exercises

Problem 1 Show that for any transitive model {M}, the set of ordinals in {M} is itself some ordinal.

Problem 2 Assume {\mathscr M_1 \subseteq \mathscr M_2}. Show that

  1. If {\phi} is {\Delta_0}, then {\mathscr M_1 \vDash \phi[b_1, \dots, b_n] \iff \mathscr M_2 \vDash \phi[b_1, \dots, b_n]}.
  2. If {\phi} is {\Sigma_1}, then {\mathscr M_1 \vDash \phi[b_1, \dots, b_n] \implies \mathscr M_2 \vDash \phi[b_1, \dots, b_n]}.
  3. If {\phi} is {\Pi_1}, then {\mathscr M_2 \vDash \phi[b_1, \dots, b_n] \implies \mathscr M_1 \vDash \phi[b_1, \dots, b_n]}.

Problem 3 (Reflection) Let {\kappa} be an inaccessible cardinal such that {|V_\alpha| < \kappa} for all {\alpha < \kappa}. Prove that for any {\delta < \kappa} there exists {\delta < \alpha < \kappa} such that {V_\alpha \prec V_\kappa}; in other words, the set of {\alpha} such that {V_\alpha \prec V_\kappa} is unbounded in {\kappa}. This means that properties of {V_\kappa} reflect down to properties of {V_\alpha}.

Problem 4 (Inaccessible Cardinal Produce Models) Let {\kappa} be an inaccessible cardinal. Prove that {V_\kappa} is a model of {\mathsf{ZFC}}.


(Standard post on cardinals, as a prerequisite for forthcoming theory model post.)

An ordinal measures a total ordering. However, it does not do a fantastic job at measuring size. For example, there is a bijection between the elements of {\omega} and {\omega+1}:

\displaystyle \begin{array}{rccccccc} \omega+1 = & \{ & \omega & 0 & 1 & 2 & \dots & \} \\ \omega = & \{ & 0 & 1 & 2 & 3 & \dots & \}. \end{array}

In fact, as you likely already know, there is even a bijection between {\omega} and {\omega^2}:

\displaystyle \begin{array}{l|cccccc} + & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & 0 & 1 & 3 & 6 & 10 & \dots \\ \omega & 2 & 4 & 7 & 11 & \dots & \\ \omega \cdot 2 & 5 & 8 & 12 & \dots & & \\ \omega \cdot 3 & 9 & 13 & \dots & & & \\ \omega \cdot 4 & 14 & \dots & & & & \end{array}

So ordinals do not do a good job of keeping track of size. For this, we turn to the notion of a cardinal number.

1. Equinumerous Sets and Cardinals

Definition 1 Two sets {A} and {B} are equinumerous, written {A \approx B}, if there is a bijection between them.

Definition 2 A cardinal is an ordinal {\kappa} such that for no {\alpha < \kappa} do we have {\alpha \approx \kappa}.

Example 3 (Examples of Cardinals) Every finite number is a cardinal. Moreover, {\omega} is a cardinal. However, {\omega+1}, {\omega^2}, {\omega^{2015}} are not, because they are countable.

Example 4 ({\omega^\omega} is Countable) Even {\omega^\omega} is not a cardinal, since it is a countable union

\displaystyle \omega^\omega = \bigcup_n \omega^n

and each {\omega^n} is countable.

Ques 5 Why must an infinite cardinal be a limit ordinal?

Remark 6 There is something fishy about the definition of a cardinal: it relies on an external function {f}. That is, to verify {\kappa} is a cardinal I can’t just look at {\kappa} itself; I need to examine the entire universe {V} to make sure there does not exist a bijection {f : \kappa \rightarrow \alpha} for {\alpha < \kappa}. For now this is no issue, but later in model theory this will lead to some highly counterintuitive behavior.

2. Cardinalities

Now that we have defined a cardinal, we can discuss the size of a set by linking it to a cardinal.

Definition 7 The cardinality of a set {X} is the least ordinal {\kappa} such that {X \approx \kappa}. We denote it by {\left\lvert X \right\rvert}.

Ques 8 Why must {\left\lvert X \right\rvert} be a cardinal?

Remark 9 One needs the Well-Ordering Theorem (equivalently, Choice) in order to establish that such an ordinal {\kappa} actually exists.

Since cardinals are ordinals, it makes sense to ask whether {\kappa_1 \le \kappa_2}, and so on. Our usual intuition works well here.

Proposition 10 (Restatement of Cardinality Properties) Let {X} and {Y} be sets.

  1. {X \approx Y} if and only {\left\lvert X \right\rvert = \left\lvert Y \right\rvert}, if and only if there is a bijection between {X} and {Y}.
  2. {\left\lvert X \right\rvert \le \left\lvert Y \right\rvert} if and only if there is an injective map {X \hookrightarrow Y}.

Ques 11 Prove this.

3. Aleph Numbers

Prototypical example for this section: {\aleph_0} is {\omega}, and {\aleph_1} is the first uncountable

First, let us check that cardinals can get arbitrarily large:

Proposition 12 We have {\left\lvert X \right\rvert < \left\lvert \mathcal P(X) \right\rvert} for every set {X}.

Proof: There is an injective map {X \hookrightarrow \mathcal P(X)} but there is no injective map {\mathcal P(X) \hookrightarrow X} by Cantor’s diagonal argument. \Box

Thus we can define:

Definition 13 For a cardinal {\kappa}, we define {\kappa^+} to be the least cardinal above {\kappa}, called the successor cardinal.

This {\kappa^+} exists and has {\kappa^+ \le \left\lvert \mathcal P(\kappa) \right\rvert}.

Next, we claim that:

Exercise 14 Show that if {A} is a set of cardinals, then {\cup A} is a cardinal.

Thus by transfinite induction we obtain that:

Definition 15 For any {\alpha \in \mathrm{On}}, we define the aleph numbers as

\displaystyle \begin{aligned} \aleph_0 &= \omega \\ \aleph_{\alpha+1} &= \left( \aleph_\alpha \right)^+ \\ \aleph_{\lambda} &= \bigcup_{\alpha < \lambda} \aleph_\alpha. \end{aligned}

Thus we have the following sequence of cardinals:

\displaystyle 0 < 1 < 2 < \dots < \aleph_0 < \aleph_1 < \dots < \aleph_\omega < \aleph_{\omega+1} < \dots

By definition, {\aleph_0} is the cardinality of the natural numbers, {\aleph_1} is the first uncountable ordinal, \dots.

We claim the aleph numbers constitute all the cardinals:

Lemma 16 (Aleph Numbers Constitute All Infinite Cardinals) If {\kappa} is a cardinal then either {\kappa} is finite (i.e. {\kappa \in \omega}) or {\kappa = \aleph_\alpha} for some {\alpha \in \mathrm{On}}.

Proof: Assume {\kappa} is infinite, and take {\alpha} minimal with {\aleph_\alpha \ge \kappa}. Suppose for contradiction that we have {\aleph_\alpha > \kappa}. We may assume {\alpha > 0}, since the case {\alpha = 0} is trivial.

If {\alpha = \overline{\alpha} + 1} is a successor, then

\displaystyle \aleph_{\overline{\alpha}} < \kappa < \aleph_{\alpha} = (\aleph_{\overline{\alpha}})^+

which contradicts the fact the definition of the successor cardinal. If {\alpha = \lambda} is a limit ordinal, then {\aleph_\lambda} is the supremum {\bigcup_{\gamma < \lambda} \aleph_\gamma}. So there must be some {\gamma < \lambda} has {\aleph_\gamma > \kappa}, which contradicts the minimality of {\alpha}. \Box

Definition 17 An infinite cardinal which is not a successor cardinal is called a limit cardinal. It is exactly those cardinals of the form {\aleph_\lambda}, for {\lambda} a limit ordinal, plus {\aleph_0}.

4. Cardinal Arithmetic

Prototypical example for this section: {\aleph_0 \cdot \aleph_0 = \aleph_0 + \aleph_0 = \aleph_0}

Recall the way we set up ordinal arithmetic. Note that in particular, {\omega + \omega > \omega} and {\omega^2 > \omega}. Since cardinals count size, this property is undesirable, and we want to have

\displaystyle \begin{aligned} \aleph_0 + \aleph_0 &= \aleph_0 \\ \aleph_0 \cdot \aleph_0 &= \aleph_0 \end{aligned}

because {\omega + \omega} and {\omega \cdot \omega} are countable. In the case of cardinals, we simply “ignore order”.

The definition of cardinal arithmetic is as expected:

Definition 18 (Cardinal Arithmetic) Given cardinals {\kappa} and {\mu}, define

\displaystyle \kappa + \mu \overset{\mathrm{def}}{=} \left\lvert \left( \left\{ 0 \right\} \times \kappa \right) \cup \left( \left\{ 1 \right\} \times \mu \right) \right\rvert


\displaystyle k \cdot \mu \overset{\mathrm{def}}{=} \left\lvert \mu \times \kappa \right\rvert .

Ques 19 Check this agrees with what you learned in pre-school for finite cardinals.

This is a slight abuse of notation since we are using the same symbols as for ordinal arithmetic, even though the results are different ({\omega \cdot \omega = \omega^2} but {\aleph_0 \cdot \aleph_0 = \aleph_0}). In general, I’ll make it abundantly clear whether I am talking about cardinal arithmetic or ordinal arithmetic. To help combat this confusion, we use separate symbols for ordinals and cardinals. Specifically, {\omega} will always refer to {\{0,1,\dots\}} viewed as an ordinal; {\aleph_0} will always refer to the same set viewed as a cardinal. More generally,

Definition 20 Let {\omega_\alpha = \aleph_\alpha} viewed as an ordinal.

However, as we’ve seen already we have that {\aleph_0 \cdot \aleph_0 = \aleph_0}. In fact, this holds even more generally:

Theorem 21 (Infinite Cardinals Squared) Let {\kappa} be an infinite cardinal. Then {\kappa \cdot \kappa = \kappa}.

Proof: Obviously {\kappa \cdot \kappa \ge \kappa}, so we want to show {\kappa \cdot \kappa \le \kappa}.

The idea is to repeat the same proof that we had for {\aleph_0 \cdot \aleph_0 = \aleph_0}, so we re-iterate it here. We took the “square” of elements of {\aleph_0}, and then re-ordered it according to the diagonal:

\displaystyle \begin{array}{l|cccccc} & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & 0 & 1 & 3 & 6 & 10 & \dots \\ 1 & 2 & 4 & 7 & 11 & \dots & \\ 2 & 5 & 8 & 12 & \dots & & \\ 3 & 9 & 13 & \dots & & & \\ 4 & 14 & \dots & & & & \end{array}

Let’s copy this idea for a general {\kappa}.

We proceed by transfinite induction on {\kappa}. The base case is {\kappa = \aleph_0}, done above. For the inductive step, first we put the “diagonal” ordering {<_{\text{diag}}} on {\kappa \times \kappa} as follows: for {(\alpha_1, \beta_1)} and {(\alpha_1, \beta_2)} in {\kappa \times \kappa} we declare {(\alpha_1, \beta_1) <_{\text{diag}} (\alpha_2, \beta_2)} if

  • {\max \left\{ \alpha_1, \beta_1 \right\} < \max \left\{ \alpha_2, \beta_2 \right\}} (they are on different diagonals), or
  • {\max \left\{ \alpha_1, \beta_1 \right\} = \max \left\{ \alpha_2, \beta_2 \right\}} and {\alpha_1 < \alpha_2} (same diagonal).

Then {<_{\text{diag}}} is a well-ordering of {\kappa \times \kappa}, so we know it is in order-preserving bijection with some ordinal {\gamma}. Our goal is to show that {\left\lvert \gamma \right\rvert \le \kappa}. To do so, it suffices to prove that for any {\overline{\gamma} \in \gamma}, we have {\left\lvert \overline{\gamma} \right\rvert < \kappa}.

Suppose {\overline{\gamma}} corresponds to the point {(\alpha, \beta) \in \kappa} under this bijection. If {\alpha} and {\beta} are both finite, then certainly {\overline{\gamma}} is finite too. Otherwise, let {\overline{\kappa} = \max \{\alpha, \beta\} < \kappa}; then the number of points below {\overline{\gamma}} is at most

\displaystyle \left\lvert \alpha \right\rvert \cdot \left\lvert \beta \right\rvert \le \overline{\kappa} \cdot \overline{\kappa} = \overline{\kappa}

by the inductive hypothesis. So {\left\lvert \overline{\gamma} \right\rvert \le \overline{\kappa} < \kappa} as desired. \Box

From this it follows that cardinal addition and multiplication is really boring:

Theorem 22 (Infinite Cardinal Arithmetic is Trivial) Given cardinals {\kappa} and {\mu}, one of which is infinite, we have

\displaystyle \kappa \cdot \mu = \kappa + \mu = \max\left( \kappa, \mu \right).

Proof: The point is that both of these are less than the square of the maximum. Writing out the details:

\displaystyle \begin{aligned} \max \left( \kappa, \mu \right) &\le \kappa + \mu \\ &\le \kappa \cdot \mu \\ &\le \max \left( \kappa, \mu \right) \cdot \max \left( \kappa, \mu \right) \\ &= \max\left( \kappa, \mu \right). \end{aligned}


5. Cardinal Exponentiation

Prototypical example for this section: {2^\kappa = \left\lvert \mathcal P(\kappa) \right\rvert}.

Definition 23 Suppose {\kappa} and {\lambda} are cardinals. Then

\displaystyle \kappa^\lambda \overset{\mathrm{def}}{=} \left\lvert \mathscr F(\lambda, \kappa) \right\rvert.

Here {\mathscr F(A,B)} is the set of functions from {A} to {B}.

As before, we are using the same notation for both cardinal and ordinal arithmetic. Sorry!

In particular, {2^\kappa = \left\lvert \mathcal P(\kappa) \right\rvert > \kappa}, and so from now on we can use the notation {2^\kappa} freely. (Note that this is totally different from ordinal arithmetic; there we had {2^\omega = \bigcup_{n\in\omega} 2^n = \omega}. In cardinal arithmetic {2^{\aleph_0} > \aleph_0}.)

I have unfortunately not told you what {2^{\aleph_0}} equals. A natural conjecture is that {2^{\aleph_0} = \aleph_1}; this is called the Continuum Hypothesis. It turns out to that this is undecidable — it is not possible to prove or disprove this from the {\mathsf{ZFC}} axioms.

6. Cofinality

Prototypical example for this section: {\aleph_0}, {\aleph_1}, \dots are all regular, but {\aleph_\omega} has cofinality {\omega}.

Definition 24 Let {\lambda} be a limit ordinal, and {\alpha} another ordinal. A map {f : \alpha \rightarrow \lambda} of ordinals is called cofinal if for every {\overline{\lambda} < \lambda}, there is some {\overline{\alpha} \in \alpha} such that {f(\overline{\alpha}) \ge \overline{\lambda}}. In other words, the map reaches arbitrarily high into {\lambda}.

Example 25 (Example of a Cofinal Map) {\empty}

  1. The map {\omega \rightarrow \omega^\omega} by {n \mapsto \omega^n} is cofinal.
  2. For any ordinal {\alpha}, the identity map {\alpha \rightarrow \alpha} is cofinal.

Definition 26 Let {\lambda} be a limit ordinal. The cofinality of {\lambda}, denoted {\text{cof }(\lambda)}, is the smallest ordinal {\alpha} such that there is a cofinal map {\alpha \rightarrow \lambda}.

Ques 27 Why must {\alpha} be an infinite cardinal?

Usually, we are interested in taking the cofinality of a cardinal {\kappa}.

Pictorially, you can imagine standing at the bottom of the universe and looking up the chain of ordinals to {\kappa}. You have a machine gun and are firing bullets upwards, and you want to get arbitrarily high but less than {\kappa}. The cofinality is then the number of bullets you need to do this.

We now observe that “most” of the time, the cofinality of a cardinal is itself. Such a cardinal is called regular.

Example 28 ({\aleph_0} is Regular) {\text{cof }(\aleph_0) = \aleph_0}, because no finite subset of {\omega} can reach arbitrarily high.

Example 29 ({\aleph_1} is Regular) {\text{cof }(\aleph_1) = \aleph_1}. Indeed, assume for contradiction that some countable set of ordinals {A = \{ \alpha_0, \alpha_1, \dots \} \subseteq \omega_1} reaches arbitrarily high inside {\omega_1}. Then {\Lambda = \cup A} is a countable ordinal, because it is a countable union of countable ordinals. In other words {\Lambda \in \omega_1}. But {\Lambda} is an upper bound for {A}, contradiction.

On the other hand, there are cardinals which are not regular; since these are the “rare” cases we call them singular.

Example 30 ({\aleph_\omega} is Not Regular) Notice that {\aleph_0 < \aleph_1 < \aleph_2 < \dots} reaches arbitrarily high in {\aleph_\omega}, despite only having {\aleph_0} terms. It follows that {\text{cof }(\aleph_\omega) = \aleph_0}.

We now confirm a suspicion you may have:

Theorem 31 (Successor Cardinals Are Regular) If {\kappa = \overline{\kappa}^+} is a successor cardinal, then it is regular.

Proof: We copy the proof that {\aleph_1} was regular.

Assume for contradiction that for some {\mu \le \overline{\kappa}}, there are {\mu} sets reaching arbitrarily high in {\kappa} as a cardinal. Observe that each of these sets must have cardinality at most {\overline{\kappa}}. We take the union of all {\mu} sets, which gives an ordinal {\Lambda} serving as an upper bound.

The number of elements in the union is at most

\displaystyle \#\text{sets} \cdot \#\text{elms} \le \mu \cdot \overline{\kappa} = \overline{\kappa}

and hence {\left\lvert \Lambda \right\rvert \le \overline{\kappa} < \kappa}. \Box

7. Inaccessible Cardinals

So, what about limit cardinals? It seems to be that most of them are singular: if {\aleph_\lambda \ne \aleph_0} is a limit ordinal, then the sequence {\{\aleph_\alpha\}_{\alpha \in \lambda}} (of length {\lambda}) is certainly cofinal.

Example 32 (Beth Fixed Point) Consider the monstrous cardinal

\displaystyle \kappa = \aleph_{\aleph_{\aleph_{\ddots}}}.

This might look frighteningly huge, as {\kappa = \aleph_\kappa}, but its cofinality is {\omega} as it is the limit of the sequence

\displaystyle \aleph_0, \aleph_{\aleph_0}, \aleph_{\aleph_{\aleph_0}}, \dots

More generally, one can in fact prove that

\displaystyle \text{cof }(\aleph_\lambda) = \text{cof }(\lambda).

But it is actually conceivable that {\lambda} is so large that {\left\lvert \lambda \right\rvert = \left\lvert \aleph_\lambda \right\rvert}.

A regular limit cardinal other than {\aleph_0} has a special name: it is weakly inaccessible. Such cardinals are so large that it is impossible to prove or disprove their existence in {\mathsf{ZFC}}. It is the first of many so-called “large cardinals”.

An infinite cardinal {\kappa} is a strong limit cardinal if

\displaystyle \forall \overline{\kappa} < \kappa \quad 2^{\overline{\kappa}} < \kappa

for any cardinal {\overline{\kappa}}. For example, {\aleph_0} is a strong limit cardinal.

Ques 33 Why must strong limit cardinals actually be limit cardinals? (This is offensively easy.)

A regular strong limit cardinal other than {\aleph_0} is called strongly inaccessible.

8. Exercises

Problem 1 Compute {\left\lvert V_\omega \right\rvert}.

Problem 2 Prove that for any limit ordinal {\alpha}, {\text{cof }(\alpha)} is a regular cardinal.

Sproblem 3 (Strongly Inaccessible Cardinals) Show that for any strongly inaccessible {\kappa}, we have {\left\lvert V_\kappa \right\rvert = \kappa}.

Problem 4 (Konig’s Theorem) Show that

\displaystyle \kappa^{\text{cof }(\kappa)} > \kappa

for every infinite cardinal {\kappa}.

(This post is a draft of a chapter from my Napkin project.)

Cauchy’s Functional Equation and Zorn’s Lemma

This is a draft of an appendix chapter for my Napkin project.

In the world of olympiad math, there’s a famous functional equation that goes as follows:

\displaystyle f : {\mathbb R} \rightarrow {\mathbb R} \qquad f(x+y) = f(x) + f(y).

Everyone knows what its solutions are! There’s an obvious family of solutions {f(x) = cx}. Then there’s also this family of… uh… noncontinuous solutions (mumble grumble) pathological (mumble mumble) Axiom of Choice (grumble).

Don’t worry, I know what I’m doing!

There’s also this thing called Zorn’s Lemma. It sounds terrifying, because it’s equivalent to the Axiom of Choice, which is also terrifying because why not.

In this post I will try to de-terrify these things, because they’re really not terrifying and I’m not sure why no one bothered to explain this properly yet. I have yet to see an olympiad handout that explains how you would construct a pathological solution, even though it’s really quite natural. So let me fix this problem now…

1.Let’s Construct a Monster

Let’s try to construct a “bad” function {f} and see what happens.

By scaling, let’s assume WLOG that {f(1) = 1}. Thus {f(n) = n} for every integer {n}, and you can easily show from here that

\displaystyle f\left( \frac mn \right) = \frac mn.

So {f} is determined for all rationals. And then you get stuck.

None of this is useful for determining, say, {f(\sqrt 2)}. You could add and subtract rational numbers all day and, say, {\sqrt 2} isn’t going to show up at all.

Well, we’re trying to set things on fire anyways, so let’s set

\displaystyle f(\sqrt 2) = 2015

because why not? By the same induction, we get {f(n\sqrt2) = 2015n}, and then that

\displaystyle f\left( a + b \sqrt 2 \right) = a + 2015b.

Here {a} and {b} are rationals. Well, so far so good — as written, this is a perfectly good solution, other than the fact that we’ve only defined {f} on a tiny portion of the real numbers.

Well, we can do this all day:

\displaystyle f\left( a + b \sqrt 2 + c \sqrt 3 + d \pi \right) = a + 2015b + 1337c - 999d.

Perfectly consistent.

You can kind of see how we should keep going now. Just keep throwing in new real numbers which are “independent” to the previous few, assigning them to whatever junk we want. It feels like it should be workable. . .

In a moment I’ll explain what “independent” means (though you might be able to guess already), but at the moment there’s a bigger issue: no matter how many numbers we throw, it seems like we’ll never finish. Let’s address the second issue first.

2. Review of Finite Induction

When you do induction, you get to count off {1}, {2}, {3}, … and so on. So for example, suppose we had a “problem” such as the following: Prove that the intersection of {n} open intervals is either {\varnothing} or an open interval. You can do this by induction easily: it’s true for {n = 2}, and for the larger cases it’s similarly easy.

But you can’t conclude from this that infinitely many open intervals intersect at some open interval. Indeed, this is false: consider the intervals

\displaystyle \left( -1, 1 \right), \quad \left( -\frac12, \frac12 \right), \quad \left( -\frac13, \frac13 \right), \quad \left( -\frac14, \frac14 \right), \quad \dots

This infinite set of intervals intersects at a single point {\{0\}}!

The moral of the story is that induction doesn’t let us reach infinity. Too bad, because we’d have loved to use induction to help us construct a monster. That’s what we’re doing, after all — adding things in one by one.

3. Transfinite Induction

Well, it turns out we can, but we need a new notion of number.

For this we need a notion of an ordinal number. I defined these in their full glory a previous blog post, but I don’t need the full details of that. Here’s what I want to say: after all the natural numbers

\displaystyle 0, \; 1, \; \dots,

I’ll put a new number called {\omega}, representing how large the natural numbers are. After that there’s a number called

\displaystyle \omega+1, \; \omega+2, \; \dots

and eventually a number called {2\omega}.

The list goes on:

\displaystyle \begin{aligned} 0, & 1, 2, 3, \dots, \omega \\ & \omega+1, \omega+2, \dots, \omega+\omega \\ & 2\omega+1, 2\omega+2, \dots, 3\omega \\ & \vdots \\ & \omega^2 + 1, \omega^2+2, \dots \\ & \vdots \\ & \omega^3, \dots, \omega^4, \dots, \omega^\omega \\ & \vdots, \\ & \omega^{\omega^{\omega^{\dots}}} \\ \end{aligned}

Pictorially, it kind of looks like this:


Anyways, in the same way that natural numbers dominate all finite sets, the ordinals dominate all the sets.

Theorem 1 For every set {S} there’s some ordinal {\alpha} which is bigger than it.

But it turns out (and you can intuitively see) that as large as the ordinals grow, there is no infinite descending chain. Meaning: if I start at an ordinal (like {2 \omega + 4}) and jump down, I can only take finitely many jumps before I hit {0}. (To see this, try writing down a chain starting at {2 \omega + 4} yourself.) Hence, induction and recursion still work verbatim:

Theorem 2 Given a statement {P(-)}, suppose that

  • {P(0)} is true, and
  • If {P(\alpha)} is true for all {\alpha < \beta}, then {P(\beta)} is true.

Then {P(\beta)} is true.

Similarly, you’re allowed to do recursion to define {x_\beta} if you know the value of {x_\alpha} for all {\alpha < \beta}.

The difference from normal induction or recursion is that we’ll often only do things like “define {x_{n+1} = \dots}”. But this is not enough to define {x_\alpha} for all {\alpha}. To see this, try using our normal induction and see how far we can climb up the ladder.

Answer: you can’t get {\omega}! It’s not of the form {n+1} for any of our natural numbers {n} — our finite induction only lets us get up to the ordinals less than {\omega}. Similarly, the simple {+1} doesn’t let us hit the ordinal {2\omega}, even if we already have {\omega+n} for all {n}. Such ordinals are called limit ordinals. The ordinal that are of the form {\alpha+1} are called successor ordinals.

So a transfinite induction or recursion is very often broken up into three cases. In the induction phrasing, it looks like

  • (Zero Case) First, resolve {P(0)}.
  • (Successor Case) Show that from {P(\alpha)} we can get {P(\alpha+1)}.
  • (Limit Case) Show that {P(\lambda)} holds given {P(\alpha)} for all {\alpha < \lambda}, where {\lambda} is a limit ordinal.

Similarly, transfinite recursion often is split into cases too.

  • (Zero Case) First, define {x_0}.
  • (Successor Case) Define {x_{\alpha+1}} from {x_\alpha}.
  • (Limit Case) Define {x_\lambda} from {x_\alpha} for all {\alpha < \lambda}, where {\lambda} is a limit ordinal.

In both situations, finite induction only does the first two cases, but if we’re able to do the third case we can climb far above the barrier {\omega}.

4. Wrapping Up Functional Equations

Let’s return to solving our problem.

Let {S_n} denote the set of “base” numbers we have at the {n} the step. In our example, we might have

\displaystyle S_1 = \left\{ 1 \right\}, \quad S_2 = \left\{ 1, \sqrt 2 \right\}, \quad S_3 = \left\{ 1, \sqrt 2, \sqrt 3 \right\}, \quad S_4 = \left\{ 1, \sqrt 2, \sqrt 3, \pi \right\}, \quad \dots

and we’d like to keep building up {S_i} until we can express all real numbers. For completeness, let me declare {S_0 = \varnothing}.

First, I need to be more precise about “independent”. Intuitively, this construction is working because

\displaystyle a + b \sqrt 2 + c \sqrt 3 + d \pi

is never going to equal zero for rational numbers {a}, {b}, {c}, {d} (other than all zeros). In general, a set {X} of numbers is “independent” if the combination

\displaystyle c_1x_1 + c_2x_2 + \dots + c_mx_m = 0

never occurs for rational numbers {{\mathbb Q}} unless {c_1 = c_2 = \dots = c_m = 0}. Here {x_i \in X} are distinct. Note that even if {X} is infinite, I can only take finite sums! (This notion has a name: we want {X} to be linearly independent over {{\mathbb Q}}.)

When do we stop? We’d like to stop when we have a set {S_{\text{something}}} that’s so big, every real number can be written in terms of the independent numbers. (This notion also has a name: it’s called a {{\mathbb Q}}-basis.) Let’s call such a set spanning; we stop once we hit a spanning set.

The idea that we can induct still seems okay: suppose {S_\alpha} isn’t spanning. Then there’s some number that is independent of {S_\alpha}, say {\sqrt{2015}\pi} or something. Then we just add it to get {S_{\alpha+1}}. And we keep going.

Unfortunately, as I said before it’s not enough to be able to go from {S_\alpha} to {S_{\alpha+1}} (successor case); we need to handle the limit case as well. But it turns out there’s a trick we can do. Suppose we’ve constructed all the sets {S_0}, {S_1}, {S_2}, …, one for each positive integer {n}, and none of them are spanning. The next thing I want to construct is {S_\omega}; somehow I have to “jump”. To do this, I now take the infinite union

\displaystyle S_\omega \overset{\text{def}}{=} S_0 \cup S_1 \cup S_2 \cup \dots.

The elements of this set are also independent (why?).

Ta-da! With the simple trick of “union all the existing sets”, we’ve just jumped the hurdle to the first limit ordinal {\omega}. Then we can construct {S_{\omega+1}}, {S_{\omega+2}}, …, and for the next limit we just do the same trick of “union-ing” all the previous sets.

So we can formalize the process as follows:

  1. Let {S_0 = \varnothing}.
  2. For a successor stage {S_{\alpha+1}}, add any element to {S_\alpha} to obtain {S_{\alpha+1}}.
  3. For a limit stage {S_{\lambda}}, take the union {\bigcup_{\gamma < \lambda} S_\gamma}.

How do we know that we’ll stop eventually? Well, the thing is that this process consumes a lot of real numbers. In particular, the ordinals get larger than the size of {{\mathbb R}}. Hence if we don’t stop we will quite literally reach a point where we have used up every single real number. Clearly that’s impossible, because by then the elements can’t possibly be independent! (EDIT Dec 20 2015: To be clear, the claim that “ordinals get larger than the size of the reals” requires the Axiom of Choice; one can’t do this construction using transfinite induction alone. Thanks reddit for calling me out on this.)

So by transfinite recursion (and Choice), we eventually hit some {S_\gamma} which is spanning: the elements are all independent, but every real number can be expressed using it. Done! This set has a name: a Hamel basis.

Thus this problem is dead, dead, dead. Any questions?


5. Zorn’s Lemma

Now I can tell you what Zorn’s Lemma is: it lets us do the same thing in any poset.

We can think of the above example as follows: consider all sets of independent elements. These form a partially ordered set by inclusion, and what we did was quite literally climb up a chain

\displaystyle S_0 \subset S_1 \subset S_2 \subset \dots.

It’s not quite climbing since we weren’t just going one step at a time: we had to do “jumps” to get up to {S_\omega} and resume climbing. But the main idea is to climb up a poset until we’re at the very top; in the previous case, when we reached the spanning set.

The same thing works verbatim with any partially ordered set {\mathcal P}. Let’s define some terminology. A local maximum (or maximal element) of the entire poset {\mathcal P} is an element which has no other elements strictly greater than it.

Now a chain of length {\gamma} is a set of elements {p_\alpha} for every {\alpha < \gamma} such that {p_0 < p_1 < p_2 < \dots}. (Observe that a chain has a last element if and only if {\gamma} is a successor ordinal, like {\omega+3}.) An upper bound to a chain is an element {\tilde p} which is greater than or equal to all elements of the chain. In particular, if {\gamma} is a successor ordinal, then just taking the last element of the chain works.

In this language, Zorn’s Lemma states that

Theorem 3 (Zorn’s Lemma) Let {\mathcal P} be a nonempty partially ordered set. If every chain has an upper bound, then {\mathcal P} has a local maximum.

Chains with length equal to a successor ordinal always have upper bounds, but this is not true in the limit case. So the hypothesis of Zorn’s Lemma is exactly what lets us “jump” up to define {p_\omega} and other limit ordinals. And the proof of Zorn’s Lemma is straightforward: keep climbing up the poset at successor stages, using Zorn’s condition to jump up at limit stages, and thus building a really long chain. But we have to eventually stop, or we literally run out of elements of {\mathcal P}. And the only possible stopping point is a local maximum.

If we want to phrase our previous solution in terms of Zorn’s Lemma, we’d say: Proof: Look at the poset whose elements are sets of independent real numbers. Every chain {S_0 \subset S_1 \subset \dots} has an upper bound {\bigcup S_\alpha} (which you have to check is actually an element of the poset). Thus by Zorn, there is a local maximum {S}. Then {S} must be spanning, because otherwise we could add an element to it. \Box

So really, Zorn’s Lemma is encoding all of the work of climbing that I argued earlier. It’s a neat little package that captures all the boilerplate, and tells you exactly what you need to check.

One last thing you might ask: where is the Axiom of Choice used? Well, the idea is that for any chain there could be lots of {\tilde p}‘s, and you need to pick one of them. Since you are making arbitrary choices infinitely many times, you need the Axiom of Choice. But really, it’s nothing special. [EDIT: AM points out that in order to talk about cardinalities in Theorem 1, one also needs the Axiom of Choice.]

6. Conclusion

In the words of Timothy Gowers,

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

Really, there’s nothing tricky at all here. People seem scared of Zorn’s Lemma, and claim it’s not intuitive or something. But really, all we’re doing is climbing up a poset. Nothing tricky at all.

Set Theory, Part 2: Constructing the Ordinals

This is a continuation of my earlier set theory post. In this post, I’ll describe the next three axioms of ZF and construct the ordinal numbers.

1. The Previous Axioms

As review, here are the natural descriptions of the five axioms we covered in the previous post.

Axiom 1 (Extensionality) Two sets are equal if they have the same elements.

Axiom 2 (Empty Set Exists) There exists an empty set {\varnothing} which contains no elements

Axiom 3 (Pairing) Given two elements {x} and {y}, there exists a set {\{x,y\}} containing only those two elements. (It is permissible to have {x=y}, meaning that if {x} is a set then so is {\{x\}}.)

Axiom 4 (Union) Given a set {a}, we can create {\cup a}, the union of the elements of {a}. For example, if {a = \{ \{1,2\}, \{3,4\} \}}, then {z = \{1,2,3,4\}} is a set.

Axiom 5 (Power Set) Given any set {x}, the power set {\mathcal P(x)} is a set.

I’ll comment briefly on what these let us do now. Let {V_0 = \varnothing}, and recursively define {V_{n+1} = \mathcal P(V_n)}. So for example,

\displaystyle  \begin{aligned} 	V_0 &= \varnothing \\ 	V_1 &= \{\varnothing\} \\ 	V_2 &= \{ \varnothing, \{\varnothing\} \} \\ 	V_3 &= \Big\{ \varnothing, \{\varnothing\}, 	\{\{\varnothing\}\}, \big\{\varnothing, \{\varnothing\} \big\}\Big\} \\ 	&\phantom=\vdots \end{aligned}

Now let’s drop the formalism for a moment and go on a brief philosophical musing. Suppose we have a universe {V_\omega} (I’ll explain later what {\omega} is) where the only sets are those which appear in some {V_n}. You might then see, in fact, that the sets in {V_\omega} actually obey all five axioms above. What we’ve done is provide a model for which the five axioms are consistent.

But this is a pretty boring model right now for the following reason: even though there are infinitely many sets, there are no infinite sets. In a moment I’ll tell you how we can add new axioms to make infinite sets. But first let me tell you how we construct the natural numbers.

2. The Axiom of Foundation

We’re about to wade into the territory of the infinite, so first I need an axiom to protect us from really bad stuff from happening. What I’m going to do is forbid infinite {\in} chains.

Axiom 6 (Foundation) Loosely, there is no infinite chain of sets

\displaystyle  x_0 \ni x_1 \ni x_2 \ni \dots.

You can see why this seems reasonable: if I take a random set, I can hop into one of its elements. That’s itself a set, so I can jump into that guy and keep going down. In the finite universe {V_\omega}, you can see that eventually I’ll hit {\varnothing}, the very bottom of the universe. I want the same to still be true even if my sets are infinite.

This isn’t the actual statement of the axiom. The way to say this in machine code is that for any nonempty set {x}, there exists a {y \in x} such that {z \notin y} for any {z \in x}. We can’t actually write about something like {x_0 \ni x_1 \ni \dots} in machine code (yet). Nevertheless this suffices for our axioms.

There’s an important consequence of this.

Theorem 1 {x \notin x} for any set {x}.

Proof: For otherwise we would have {x \ni x \ni x \ni \dots} which violates Foundation. \Box

3. The Natural Numbers

Note: in set theory, {0} is considered a natural number.

Now for the fun part. If we want to encode math statements into the language of set theory, the first thing we’d want to do is encode the numbers {0}, {1}, {\dots} in there so that we can actually do arithmetic. How might we do that?

What we’re going to do is construct a sequence of sets of sizes {0}, {1}, {\dots} and let these correspond to the natural numbers. What sets should we choose? Well, there’s only one set of size {0}, so we begin by writing

\displaystyle  0 \overset{\text{def}}{=} \varnothing.

I’ll give away a little more than I should and then write

\displaystyle  1 \overset{\text{def}}{=} \{\varnothing\}, \quad 2 \overset{\text{def}}{=} \{\varnothing, \{\varnothing\} \}.

Now let’s think about {3}. If we want to construct a three-element set and we already have a two-element set, then we just need to add another element to {1} that’s not already in there. In other words, to construct {3} I just need to pick an {x} such that {x \notin 2}, then let {3 = \{x\} \cup 2}. (Or in terms of our axioms, {3 = \cup \left\{ 2, \{x\} \right\}}.) Now what’s an easy way to pick {x} such that {x \notin 2}? Answer: pick {x=2}. By the earlier theorem, we always {2 \notin 2}.

Now the cat’s out of the bag! We define

\displaystyle  \begin{aligned} 	0 &= \varnothing \\ 	1 &= \left\{ 0 \right\} \\ 	2 &= \left\{ 0, 1 \right\} \\ 	3 &= \left\{ 0, 1, 2 \right\} \\ 	4 &= \left\{ 0, 1, 2, 3 \right\} \\ 	&\phantom= \vdots \end{aligned}

And there you have it: the nonnegative integers. You can have some fun with this definition and write things like

\displaystyle  \left\{ x \in 8 \mid x \text{ is even} \right\} = \left\{ 0, 2, 4, 6 \right\}

now. Deep down, everything’s a set.

4. Finite Ordinals

We’re currently able to do some counting now, because we’ve defined the sequence of sets

\displaystyle  0, 1, 2, \dots

by {0 = \varnothing} and {n+1 = \{0,\dots,n\}}. This sequence is related by

\displaystyle  0 \in 1 \in 2 \in \dots.

Some properties of these “numbers” I’ve made are:

  • They are well-ordered by {\in} (which corresponds exactly with the {<} which we're familiar with; that's a good motivation for choosing this construction, as the well-ordering property is one of the most important properties of {\mathbb N}, and using {\in} for this purpose lets us do this ordering painlessly). That means if I take the elements of {n}, then I can sort them in a transitive chain like I've done above: for any {x} and {y}, either {x \in y} or {y \in x}. For example, the elements of {4} are {0}, {1}, {2}, {3} and {0 \in 1 \in 2 \in 3}. It also means that any subset has a “minimal'' element, which would just be the first element of the chain. Here is the complete definition.
  • The set {n} is transitive. What this means that it is a “closed universe” in the sense that if I look at an element {a} of {n}, all the elements of {a} are also in {n}. For example, if I take the element {3} of {5 = \{0,1,2,3,4\}}, all the elements of {3} are in {5}. Looking deeply into {n} won’t find me anything I didn’t see at the top level.

    In other words, a set {S} is transitive if for every {T \in S}, {T \subseteq S}.

A set which is both transitive and well-ordered by {\in} is called an ordinal, and the numbers {0,1,2,\dots} are precisely the finite ordinals. But now I’d like to delve into infinite numbers. Can I define some form of “infinity”?

5. Infinite Ordinals

To tell you what a set is, I only have to tell you who its elements are. And so I’m going to define the set

\displaystyle  \omega = \left\{ 0, 1, 2, \dots \right\}.

And now our counting looks like this.

\displaystyle  0, 1, 2, \dots, \omega.

We just tacked on an infinity at the end by scooping up all the natural numbers and collecting them in a set. You can do that? Sure you can! All I’ve done is written down the elements of the set, and you can check that {\omega} is indeed an ordinal.

Well, okay, there’s one caveat. We don’t actually know whether the {\omega} I’ve written down is a set. Pairing and Union lets us collect any finite collection of sets into a single set, but it doesn’t let us collect things into an infinite set. In fact, you can’t prove from the axioms that {\omega} is a set.

For this I’m going to present another two axioms. These are much more technical to describe, so I’ll lie to you about what their statements are. If you’re interested in the exact statements, consult the lecture notes linked at the bottom of this post.

Axiom 7 (Replacement) Loosely, let {f} be a function on a set {X}. Then the image of {f} is a set:

\displaystyle  \exists Y : \quad \forall y, \; y \in Y \iff \exists x : f(x) = y.

Axiom 8 (Infinity) There exists a set {\omega = \{0,1,2,\dots\}}.

With these two axioms, we can now write down the first infinite ordinal {\omega}. So now our list of ordinals is

\displaystyle  0, 1, 2, \dots, \omega.

But now in the same way we constructed {3} from {2}, we can construct a set

\displaystyle  \omega + 1 \overset{\text{def}}{=} 	\omega \cup \{\omega\} 	= \left\{ 0, 1, \dots, \omega \right\}

and then

\displaystyle  \omega + 2 \overset{\text{def}}{=} 	(\omega+1) \cup \{\omega+1\} 	= \left\{ 0, 1, \dots, \omega, \omega+1 \right\}

and so on — all of these are also transitive and well-ordered by {\in}. So now our list of ordinals is

\displaystyle  0, 1, 2, \dots, \omega, \omega+1, \omega+2, \dots.

Well, can we go on further? What we’d like is to define an “{\omega+\omega} or {2 \cdot \omega}”, which would entail capturing all of the above elements into a set. Well, I claim we can do so. Consider a function {f} defined on {\omega} which sends {n} to {\omega+n}. So {f(0) = \omega}, {f(3) = \omega+3}, {f(2014) = \omega+2014}. (Strictly, I have to prove that set-encoding of this function, namely {\{(0,\omega), (1,\omega+1), \dots\}}, is actually a set. But that’s put that aside for now.) Then Replacement tells me that I have a set

\displaystyle  	\left\{ f(0), f(1), \dots \right\} 	= \left\{ \omega, \omega+1, \omega+2, \dots \right\}

From here we can union this set with {\omega} to achieve the set {\left\{ 0,1,2,\dots,\omega,\omega+1,\omega+2,\dots \right\}}. And we can keep turning this wheel repeatedly, yielding the ordinal numbers.

\displaystyle  \begin{aligned} 	0, & 1, 2, 3, \dots, \omega \\ 	& \omega+1, \omega+2, \dots, \omega+\omega \\ 	& 2\omega+1, 2\omega+2, \dots, 3\omega \\ 	& \vdots \\ 	& \omega^2 + 1, \omega^2+2, \dots \\ 	& \vdots \\ 	& \omega^3, \dots, \omega^4, \dots, \omega^\omega \\ 	& \vdots, \\ 	& \omega^{\omega^{\omega^{\dots}}} \\ \end{aligned}

I won’t say much more about these ordinal numbers since the post is already getting pretty long, but I’ll mention that the ordinals might not correspond to a type of counting that you’re used to, in the sense that there is a bijection between the sets {\omega} to {\omega+1}. It might seem like different numbers should have different “sizes”. For this you stumble into the cardinal numbers: it turns out that a cardinal number is just defined as an ordinal number which is not in bijection with any smaller ordinal number.

6. A Last Few Axioms

I’ll conclude this exposition on ZFC with a few last axioms. First is the axiom called Comprehension, though it actually can be proven from the Replacement axiom.

Axiom 9 (Comprehension) Let {\phi} be some property, and let {S} be a set. Then the notion

\displaystyle  \left\{ x \in S \mid \phi(x) \right\}

is a set. More formally,

\displaystyle  \exists X \forall x \in X: (x \in S) \land (\phi(x)).

Notice that the comprehension must be restricted: we can only take subsets of existing sets. From this one can deduce that there is no set of all sets; if we had such a set {V}, then we could use Comprehension to generate {\{x \in V : x \notin x\}}, recovering Russel’s Paradox.

So anyways, this means that we can take list comprehensions.

Finally, I’ll touch a little on why the Axiom of Choice is actually important. You’ve probably heard the phrasing that “you can pick a sock out of every drawer” or some cute popular math phrasing like that. Here’s the precise statement.

Axiom 10 (Choice) Let {\mathcal F} be a set such that {\varnothing \notin \mathcal F}. Then we can define a function {g} on {\mathcal F} such that {g(y) \in y} for every {y \in \mathcal F}. The function {g} is called a choice function; given a bunch of sets {\mathcal F}, it chooses an element {g(y)} out of every {y}. In other words, for any {\mathcal F} with {\varnothing \notin \mathcal F}, there exists a set

\displaystyle  \left\{ (y,g(y)) \mid y \in \mathcal F \right\}.

with {g(y) \in y} for every {y}.

In light of the discussion in these posts, what’s significant is not that we can conceive such a function (how hard can it be to take an element out of a nonempty set?) but that the resulting structure is actually a set. The whole point of having the ZF axioms is that we have to be careful about how we are allowed to make new sets out of old ones, so that we don’t run ourselves into paradoxes. The Axiom of Choice reflects that this is a subtle issue.

So there you have it, the axioms of ZFC and what they do.

Thanks to Peter Koellner, Harvard, for his class Math 145a, which taught me this material. My notes for this course can be downloaded from my math website.

Thanks to J Wu for pointing out a typo in Replacement and noting that I should emphasize how {\in} leads to the well-ordering for the ordinals.

Set Theory, Part 1: An Intro to ZFC

Back in high school, I sometimes wondered what all the big deal about ZFC and the Axiom of Choice was, but I never really understood what I read in the corresponding Wikipedia page. In this post, I’ll try to explain what axiomatic set theory is trying to do in a way accessible to those with just a high school background.

1. Motivation

What we’re going to try to lay out something like a “machine code” for math: a way of making math completely rigorous, to the point where it can be verified by a machine. This would make sure that our foundation on which we do our high-level theorem proving is sound. As we’ll see in just a moment, this is actually a lot harder to do than it sounds — there are some traps if we try to play too loosely with our definitions.

First of all, since this is a set theory post, the first thing we need to do is define exactly what a set is. Well, we know what a set is, right? It’s just a collection of objects, and two sets are the same if they have the same objects. For example, we have our empty set {\varnothing} that contains no objects. We can have a set that {\{1, 2, 3\}}, or maybe the set of natural numbers {\mathbb N = \{0, 1, 2, \dots \}}. (For the purposes of set theory, {0} is usually considered a natural number.) Sets can even contain other sets, like {\left\{ \mathbb Z, \mathbb Q, \mathbb N \right\}}. Fine and dandy, right?

The trouble is that this definition actually isn’t good enough, and here’s why. If we just say “a set is any collection of objects”, then we can consider a really big set {S}, the set of all sets. So far no problem, right? {S} has the oddity that it has to contain itself {S \in S}, but oh well, no big deal. In fact, for this section, let’s even give a name to these sets — we’ll call a set extraordinary if it happens to contain itself, and ordinary otherwise. (Note that this isn’t actually a math term, it’s just for my convenience.)

Now here comes the problem, called Russell’s Paradox (link here). If I can look at the set of all sets {S}, I can also probably look at the set of all ordinary sets, which I’ll call {X}. Here’s the question: is {X} ordinary?

  • Well, if {X} is ordinary, then {X \in X}. Hence {X} is extraordinary by definition, which is impossible.
  • Now suppose {X} is extraordinary. Then {X} is not ordinary, so that implies {X \notin X}. But that contradicts the assumption that {X} is extraordinary!

So that’s all a contradiction! Just by considering the set of all ordinary sets, we’ve run into a trap.

Now if you’re not a set theorist, you could probably just brush this off, saying “oh well, I guess you can’t look at certain sets”. But if you’re a set theorist, this worries you, because you realize it means that you can’t just define a set as “a collection of objects”, because then everything would blow up. Something more is necessary.

2. The Language of Set Theory

We need a way to refer to sets other than “collection of objects”. So here’s what we’re going to do. We’ll start by defining a formal language of set theory, a way of writing logical statements. First of all we can throw in our usual logical operators, like `{\forall}‘, for all; {\exists}, exists; {=}, equals; and {X \implies Y}, our “if {X} then {Y}”. Let’s also add in {\land}, which means “and”, and {\lor}, which means “or”, as well as {\neg}, which means “not”. Now we can write down logical statements.

Since we’re doing set theory, there’s one more operator we’ll add in: the inclusion {\in}. And that’s all we’re going to use (for now).

So how do we express something like “the set {\{1, 2\}}”? The trick is that we’re not going to actually “construct” any sets, but rather refer to them indirectly, like so:

\displaystyle  \exists S : x \in S \iff \left( (x=1) \lor (x=2) \right).

“There exists an {S} such that {x} is in {S} if and only if either {1} is in {S} or {2} is in {S}”. We don’t have to refer to sets as objects in and of themselves anymore — we now have a way to “create” our sets, by writing formulas for exactly what they contain. This is something a machine can parse.

Well, what are going to do with things like {1} and {2}, which are not sets? Here’s the answer: we’re going to make everything into a set. Natural numbers will be sets. Ordered pairs will be sets. Functions will be sets. In later posts, I’ll tell you exactly how we manage to do something like encode {1} as a set. For now, all you need to know is that that sets don’t just hold objects; they hold other sets.

So now it makes sense to talk about whether something is a set or not: {\exists x} means “{x} is a set”, while {\nexists x} means “{x} is not a set”. In other words, we’ve rephrased the problem of deciding whether something is a set to whether it exists, which makes it easier to deal with in our formal language. That means that our axiom system had better find some way to let us show a lot of things exist, without letting us prove the following formula:

\displaystyle  \exists X : x \in X \iff x \notin x.

For if we prove this formula, then we have our “bad” set from Bertrand’s paradox that caused us to go down the rabbit hole in the first place.

3. The Axioms of ZF

What the axioms of ZF is do is tell a computer exactly what it’s allowed to do. Since the axioms want to be perfectly crystal clear, they’re all written in formal symbols. Humans don’t actually think in these formal symbols, unless they’re set theorists and have also gone insane. But the point of these symbols is that there is absolutely no confusion what is true and isn’t true.

It’s worth noting that there are several versions of ZF, but they’re all equivalent to one another. Also, we’ll start representing sets with lowercase letters below.

First, we agreed before that two sets are the same if they share the same elements. We’d like to encode that in an axiom: {x=y} if and only if for every {a}, {a \in x \iff a \in y}. (And now you’re starting to see how convenient it is that everything’s a set!)

This leads us to the first axiom of ZF, called Extensionality.

Axiom 1 (Extensionality) {\forall x \forall y \forall a : \left( a \in x \iff a \in y \right) \implies x = y}.

This is machine code for “if {x} and {y} have the property that {a \in x \iff a \in y}, then {x = y}”. Now our computer can recognize that two sets are equal if they share exactly the same elements. Great.

Unfortunately, our computer still doesn’t know that there even exist any sets. We have to tell it that. So what’s the first thing we tell it? There exists the empty set.

Axiom 2 (Empty Set Exists) {\exists a : \forall x \; \neg (x \in a)}.

Now you notice that I have {\neg (x \in a)} rather than the {x \notin a} that we’re all used to. That’s going to get tiring very soon, so we’ll make our first “shortcut”: whenever we write {x \notin a}, we really mean {\neg (x \in a)}. This makes it easier for us humans to read the formula, but we’re happy because in principle, we could just expand this shortcut and give the computer something to read.

Now our computer knows the empty set is a set. But does our computer know that there’s only one? In fact, yes! If we have two sets {a} and {b}, and both of them have no elements, then our computer can prove {a=b} by using Extensionality now. So in fact, there’s only one empty set. Now we’ll take another shortcut and give this set a name, “{\varnothing}”.

The next three axioms provide us with some various ways of building more sets.

Axiom 3 (Pairing) Given two elements {x} and {y}, there exists a set {a} containing only those two elements. In machine code,

\displaystyle  \forall x \forall y \exists a \quad \forall z, \; z \in a \iff \left( (z=x) \lor (z=y) \right).

Note that {x} and {y} do not have to be different! What that means is that our machine can now build a set by taking {x = y = \varnothing}. Pairing gives a set {a} such that {z \in a} if and only if {z = \varnothing}. In our human mind, what we’re thinking is “we’ve built the set {a = \{\varnothing\}}!” No need for the computer to know that, though.

We can the apply Pairing again to build some more sets, like {\{\{\varnothing\}\}}, or {\{\varnothing, \{\varnothing\}\}}. Can you see how? And now if I use pairing on those two sets, I get {\{ \{\varnothing, \{\varnothing\}\}, \{\{\varnothing\}\}\}}. Yipee!

At this point I’m going to cheat and use {0,1,2,\dots} for some examples as natural “objects”, since all those braces are getting hard to read. All you need to know at this point is that these guys are secretly sets too, and you’ll have to wait until next time for me to tell you what their elements are.

So Pairing tells you that if I have my hands on {1}, and I have my hands on {2}, then I can get my hands on the set {\{1,2\}}. Cool. Now suppose I told you: {a = \{1,2\}} and {b = \{3,4\}}. Or rather, the computer told you

\displaystyle  \exists a : x \in a \iff \left( (x=1) \lor (x=2) \right)


\displaystyle  \exists b : x \in b \iff \left( (x=3) \lor (x=4) \right).

Now you want to construct the set {\{1,2,3,4\}}. You might try pairing, but that just gives you the set {\{ \{1,2\}, \{3,4\} \}}. It turns out we’re actually super clumsy: our axioms so far don’t even let us get our hands on {1} and {2}, even though we have a set that contains them!

Maybe this is actually not that surprising. If I write down the formula

\displaystyle  \exists a : x \in a \iff \left( (x=1) \lor (x=2) \lor (x=\text{unicorn}) \right)

but I know there are no unicorns, then I shouldn’t expect to actually be able to prove that {\text{unicorn}} exists. However, I’d really like to ignore that and get the set {\{1,2,3,4\}}. The axiom that lets us do that is called Union.

Axiom 4 (Union) Given a set {a}, we can create {\cup a}, the union of the elements of {a}. For example, if {a = \{ \{1,2\}, \{3,4\} \}}, then {z = \{1,2,3,4\}} is a set. Formally,

\displaystyle  \forall a \exists z \quad \forall x \; (x \in z) \iff (\exists y : x \in y \in a).

By repeatedly using Pairing and Union, we can take any finite collection of elements and gather them into one set.

Finally, we’ll also give our computer a way to build a power set. Recall that the power set of a set {S}, which I write {\mathcal P(S)}, consists of all its subsets. For example, in human world, we had {\mathcal P(\{1,2\}) = \{\varnothing, \{1\}, \{2\}, \{1,2\}\}}.

Axiom 5 (Power Set) We can construct {\mathcal P(x)}. Formally,

\displaystyle  \forall x \exists a \forall y (y \in a \iff y \subseteq x)

where {y \subseteq x} is short for {\forall z (z \in y \implies z \in x)}.

4. Closing

That lays down the first five axioms of ZF.

Though I don’t have the space to show you how to encode the integers here, I’ll give you an idea for some ways we can hack together “encodings” of things as et. For example, what if we want an ordered pair {(x,y)}? I claim we can encode this as

\displaystyle  (x,y) \overset{\text{def}}{=} \left\{ \{x\}, \{x,y\} \right\}.

Indeed, you can check that {(x_1, y_1) = (x_2, y_2)} if and only if {x_1 = x_2} and {y_1 = y_2}, which is what we wanted. You can see that this is indeed a set by some applications of Pairing.

Extending this, if we have a relation {\sim}, we can encode it as

\displaystyle  {\sim} \overset{\text{def}}{=} \left\{ (x,y) \mid x \sim y \right\}.

A function {f} is a special type of relation such that if {(x,y_1) \in f} and {(x,y_2) \in f}, then {y_1 = y_2}. In other words, we represent functions by their “graphs” of ordered pairs {(x, f(x))}.

In the next post, I’ll explain some of the more involved axioms, and I’ll also finally tell you what the elements of {1} and {2} are. Specifically, I’ll be constructing the ordinal numbers.

Thanks to Peter Koellner, Harvard, for his class Math 145a, which taught me this material. My notes for this course can be downloaded from my math website.