A trailer for p-adic analysis, second half: Mahler coefficients

In the previous post we defined ${p}$-adic numbers. This post will state (mostly without proof) some more surprising results about continuous functions ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$. Then we give the famous proof of the Skolem-Mahler-Lech theorem using ${p}$-adic analysis.

1. Digression on ${\mathbb C_p}$

Before I go on, I want to mention that ${\mathbb Q_p}$ is not algebraically closed. So, we can take its algebraic closure ${\overline{\mathbb Q_p}}$ — but this field is now no longer complete (in the topological sense). However, we can then take the completion of this space to obtain ${\mathbb C_p}$. In general, completing an algebraically closed field remains algebraically closed, and so there is a larger space ${\mathbb C_p}$ which is algebraically closed and complete. This space is called the ${p}$-adic complex numbers.

We won’t need ${\mathbb C_p}$ at all in what follows, so you can forget everything you just read.

2. Mahler coefficients: a description of continuous functions on ${\mathbb Z_p}$

One of the big surprises of ${p}$-adic analysis is that we can concretely describe all continuous functions ${\mathbb Z_p \rightarrow \mathbb Q_p}$. They are given by a basis of functions

$\displaystyle \binom xn \overset{\mathrm{def}}{=} \frac{x(x-1) \dots (x-(n-1))}{n!}$

in the following way.

Theorem 1 (Mahler; see Schikhof Theorem 51.1 and Exercise 51.B)

Let ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ be continuous, and define

$\displaystyle a_n = \sum_{k=0}^n \binom nk (-1)^{n-k} f(n). \ \ \ \ \ (1)$

Then ${\lim_n a_n = 0}$ and

$\displaystyle f(x) = \sum_{n \ge 0} a_n \binom xn.$

Conversely, if ${a_n}$ is any sequence converging to zero, then ${f(x) = \sum_{n \ge 0} a_n \binom xn}$ defines a continuous function satisfying (1).

The ${a_i}$ are called the Mahler coefficients of ${f}$.

Exercise 2

Last post we proved that if ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ is continuous and ${f(n) = (-1)^n}$ for every ${n \in \mathbb Z_{\ge 0}}$ then ${p = 2}$. Re-prove this using Mahler’s theorem, and this time show conversely that a unique such ${f}$ exists when ${p=2}$.

You’ll note that these are the same finite differences that one uses on polynomials in high school math contests, which is why they are also called “Mahler differences”.

\displaystyle \begin{aligned} a_0 &= f(0) \\ a_1 &= f(1) - f(0) \\ a_2 &= f(2) - 2f(1) - f(0) \\ a_3 &= f(3) - 3f(2) + 3f(1) - f(0). \end{aligned}

Thus one can think of ${a_n \rightarrow 0}$ as saying that the values of ${f(0)}$, ${f(1)}$, \dots behave like a polynomial modulo ${p^e}$ for every ${e \ge 0}$. Amusingly, this fact was used on a USA TST in 2011:

Exercise 3 (USA TST 2011/3)

Let ${p}$ be a prime. We say that a sequence of integers ${\{z_n\}_{n=0}^\infty}$ is a ${p}$-pod if for each ${e \geq 0}$, there is an ${N \geq 0}$ such that whenever ${m \geq N}$, ${p^e}$ divides the sum

$\displaystyle \sum_{k=0}^m (-1)^k \binom mk z_k.$

Prove that if both sequences ${\{x_n\}_{n=0}^\infty}$ and ${\{y_n\}_{n=0}^\infty}$ are ${p}$-pods, then the sequence ${\{x_n y_n\}_{n=0}^\infty}$ is a ${p}$-pod.

3. Analytic functions

We say that a function ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ is analytic if it has a power series expansion

$\displaystyle \sum_{n \ge 0} c_n x^n \quad c_n \in \mathbb Q_p \qquad\text{ converging for } x \in \mathbb Z_p.$

As before there is a characterization in terms of the Mahler coefficients:

Theorem 4 (Schikhof Theorem 54.4)

The function ${f(x) = \sum_{n \ge 0} a_n \binom xn}$ is analytic if and only if

$\displaystyle \lim_{n \rightarrow \infty} \frac{a_n}{n!} = 0.$

Just as holomorphic functions have finitely many zeros, we have the following result on analytic functions on ${\mathbb Z_p}$.

Theorem 5 (Strassmann’s theorem)

Let ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ be analytic. Then ${f}$ has finitely many zeros.

4. Skolem-Mahler-Lech

We close off with an application of the analyticity results above.

Theorem 6 (Skolem-Mahler-Lech)

Let ${(x_i)_{i \ge 0}}$ be an integral linear recurrence. Then the zero set of ${x_i}$ is eventually periodic.

Proof: According to the theory of linear recurrences, there exists a matrix ${A}$ such that we can write ${x_i}$ as a dot product

$\displaystyle x_i = \left< A^i u, v \right>.$

Let ${p}$ be a prime not dividing ${\det A}$. Let ${T}$ be an integer such that ${A^T \equiv \mathbf{1} \pmod p}$.

Fix any ${0 \le r < N}$. We will prove that either all the terms

$\displaystyle f(n) = x_{nT+r} \qquad n = 0, 1, \dots$

are zero, or at most finitely many of them are. This will conclude the proof.

Let ${A^T = \mathbf{1} + pB}$ for some integer matrix ${B}$. We have

\displaystyle \begin{aligned} f(n) &= \left< A^{nT+r} u, v \right> = \left< (\mathbf1 + pB)^n A^r u, v \right> \\ &= \sum_{k \ge 0} \binom nk \cdot p^n \left< B^n A^r u, v \right> \\ &= \sum_{k \ge 0} a_n \binom nk \qquad \text{ where } a_n = p^n \left< B^n A^r u, v \right> \in p^n \mathbb Z. \end{aligned}

Thus we have written ${f}$ in Mahler form. Initially, we define ${f \colon \mathbb Z_{\ge 0} \rightarrow \mathbb Z}$, but by Mahler’s theorem (since ${\lim_n a_n = 0}$) it follows that ${f}$ extends to a function ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$. Also, we can check that ${\lim_n \frac{a_n}{n!} = 0}$ hence ${f}$ is even analytic.

Thus by Strassman’s theorem, ${f}$ is either identically zero, or else it has finitely many zeros, as desired. $\Box$

A trailer for p-adic analysis, first half: USA TST 2003

I think this post is more than two years late in coming, but anywhow…

This post introduces the ${p}$-adic integers ${\mathbb Z_p}$, and the ${p}$-adic numbers ${\mathbb Q_p}$. The one-sentence description is that these are “integers/rationals carrying full mod ${p^e}$ information” (and only that information).

The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.

In this whole post, ${p}$ is always a prime. Much of this is based off of Chapter 3A from Straight from the Book.

1. Motivation

Before really telling you what ${\mathbb Z_p}$ and ${\mathbb Q_p}$ are, let me tell you what you might expect them to do.

In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:

• Taking modulo a prime ${p}$ or prime ${p^e}$, and
• Looking at the exponent ${\nu_p}$.

Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo ${p}$ to gain something else to work with. Unfortunately, taking modulo ${p}$ loses some information: (the reduction ${\mathbb Z \twoheadrightarrow \mathbb Z/p}$ is far from injective).

If we want finer control, we could consider instead taking modulo ${p^2}$, rather than taking modulo ${p}$. This can also give some new information (cubes modulo ${9}$, anyone?), but it has the disadvantage that ${\mathbb Z/p^2}$ isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo ${p}$.

One of the goals of ${p}$-adic numbers is that we can get around these two issues I described. The ${p}$-adic numbers we introduce is going to have the following properties:

1. You can “take modulo ${p^e}$ for all ${e}$ at once”. In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with ${p}$-adic numbers, we won’t have to make that choice. An equation of ${p}$-adic numbers carries enough information to take modulo ${p^e}$.
2. The numbers ${\mathbb Q_p}$ form a field, the nicest possible algebraic structure: ${1/p}$ makes sense. Contrast this with ${\mathbb Z/p^2}$, which is not even an integral domain.
3. It doesn’t lose as much information as taking modulo ${p}$ does: rather than the surjective ${\mathbb Z \twoheadrightarrow \mathbb Z/p}$ we have an injective map ${\mathbb Z \hookrightarrow \mathbb Z_p}$.
4. Despite this, you “ignore” some “irrelevant” data. Just like taking modulo ${p}$, you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things. (To draw an analogy: the equation ${ a^2 + b^2 + c^2 + d^2 = -1}$ has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime ${p}$, because once you take modulo ${p}$ you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in ${p}$-adics: the above equation has a solution in ${\mathbb Z_p}$ for every prime ${p}$.)

So, you can think of ${p}$-adic numbers as the right tool to use if you only really care about modulo ${p^e}$ information, but normal ${\mathbb Z/p^e}$ isn’t quite powerful enough.

To be more concrete, I’ll give a poster example now:

Example 1 (USA TST 2002/2)

For a prime ${p}$, show the value of

$\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}$

does not depend on ${x}$.

Here is a problem where we clearly only care about ${p^e}$-type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod ${p^3}$ (try it!). The basic issue is that there is no good way to deal with the denominators modulo ${p^3}$ (in part ${\mathbb Z/p^3}$ is not even an integral domain).

However, with ${p}$-adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity

$\displaystyle \left( 1 + \frac{px}{k} \right)^{-2} = \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^n.$

Such an identity makes no sense over ${\mathbb Q}$ or ${\mathbb R}$ for converge reasons, but it will work fine over the ${\mathbb Q_p}$, which is all we need.

2. Algebraic perspective

We now construct ${\mathbb Z_p}$ and ${\mathbb Q_p}$. I promised earlier that a ${p}$-adic integer will let you look at “all residues modulo ${p^e}$” at once. This definition will formalize this.

2.1. Definition of ${\mathbb Z_p}$

Definition 2 (Introducing ${\mathbb Z_p}$)

A ${p}$-adic integer is a sequence

$\displaystyle x = (x_1 \bmod p, \; x_2 \bmod{p^2}, \; x_3 \bmod{p^3}, \; \dots)$

of residues ${x_e}$ modulo ${p^e}$ for each integer ${e}$, satisfying the compatibility relations ${x_i \equiv x_j \pmod{p^i}}$ for ${i < j}$.

The set ${\mathbb Z_p}$ of ${p}$-adic integers forms a ring under component-wise addition and multiplication.

Example 3 (Some ${3}$-adic integers)

Let ${p=3}$. Every usual integer ${n}$ generates a (compatible) sequence of residues modulo ${p^e}$ for each ${e}$, so we can view each ordinary integer as ${p}$-adic one:

$\displaystyle 50 = \left( 2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 50 \bmod{81}, \; 50 \bmod{243}, \; \dots \right).$

On the other hand, there are sequences of residues which do not correspond to any usual integer despite satisfying compatibility relations, such as

$\displaystyle \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right)$

which can be thought of as ${x = 1 + p + p^2 + \dots}$.

In this way we get an injective map

$\displaystyle \mathbb Z \hookrightarrow \mathbb Z_p \qquad n \mapsto \left( n \bmod p, n \bmod{p^2}, n \bmod{p^3}, \dots \right)$

which is not surjective. So there are more ${p}$-adic integers than usual integers.

(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as

$\displaystyle \mathbb Z_p \overset{\mathrm{def}}{=} \varprojlim \mathbb Z/p^e \mathbb Z$

where the inverse limit is taken across ${e \ge 1}$.)

Exercise 4

Check that ${\mathbb Z_p}$ is an integral domain.

2.2. Base ${p}$ expansion

Here is another way to think about ${p}$-adic integers using “base ${p}$”. As in the example earlier, every usual integer can be written in base ${p}$, for example

$\displaystyle 50 = \overline{1212}_3 = 2 \cdot 3^0 + 1 \cdot 3^1 + 2 \cdot 3^2 + 1 \cdot 3^3.$

More generally, given any ${x = (x_1, \dots) \in \mathbb Z_p}$, we can write down a “base ${p}$” expansion in the sense that there are exactly ${p}$ choices of ${x_k}$ given ${x_{k-1}}$. Continuing the example earlier, we would write

\displaystyle \begin{aligned} \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3 \end{aligned}

and in general we can write

$\displaystyle x = \sum_{k \ge 0} a_k p^k = \overline{\dots a_2 a_1 a_0}_p$

where ${a_k \in \{0, \dots, p-1\}}$, such that the equation holds modulo ${p^e}$ for each ${e}$. Note the expansion is infinite to the left, which is different from what you’re used to.

(Amusingly, negative integers also have infinite base ${p}$ expansions: ${-4 = \overline{\dots222212}_3}$, corresponding to ${(2 \bmod 3, \; 5 \bmod 9, \; 23 \bmod{27}, \; 77 \bmod{81} \dots)}$.)

Thus you may often hear the advertisement that a ${p}$-adic integer is an “possibly infinite base ${p}$ expansion”. This is correct, but later on we’ll be thinking of ${\mathbb Z_p}$ in a more and more “analytic” way, and so I prefer to think of this as a “Taylor series with base ${p}$. Indeed, much of your intuition from generating functions ${K[[X]]}$ (where ${K}$ is a field) will carry over to ${\mathbb Z_p}$.

2.3. Constructing ${\mathbb Q_p}$

Here is one way in which your intuition from generating functions carries over:

Proposition 5 (Non-multiples of ${p}$ are all invertible)

The number ${x \in \mathbb Z_p}$ is invertible if and only if ${x_1 \ne 0}$. In symbols,

$\displaystyle x \in \mathbb Z_p^\times \iff x \not\equiv 0 \pmod p.$

Contrast this with the corresponding statement for ${K[ [ X ] ]}$: a generating function ${F \in K[ [ X ] ]}$ is invertible iff ${F(0) \neq 0}$.

Proof: If ${x \equiv 0 \pmod p}$ then ${x_1 = 0}$, so clearly not invertible. Otherwise, ${x_e \not\equiv 0 \pmod p}$ for all ${e}$, so we can take an inverse ${y_e}$ modulo ${p^e}$, with ${x_e y_e \equiv 1 \pmod{p^e}}$. As the ${y_e}$ are themselves compatible, the element ${(y_1, y_2, \dots)}$ is an inverse. $\Box$

Example 6 (We have ${-\frac{1}{2} = \overline{\dots1111}_3 \in \mathbb Z_3}$)

We claim the earlier example is actually

\displaystyle \begin{aligned} -\frac{1}{2} = \left( 1 \bmod 3, \; 4 \bmod 9, \; 13 \bmod{27}, \; 40 \bmod{81}, \; \dots \right) &= 1 + 3 + 3^2 + \dots \\ &= \overline{\dots1111}_3. \end{aligned}

Indeed, multiplying it by ${-2}$ gives

$\displaystyle \left( -2 \bmod 3, \; -8 \bmod 9, \; -26 \bmod{27}, \; -80 \bmod{81}, \; \dots \right) = 1.$

(Compare this with the “geometric series” ${1 + 3 + 3^2 + \dots = \frac{1}{1-3}}$. We’ll actually be able to formalize this later, but not yet.)

Remark 7 (${\frac{1}{2}}$ is an integer for ${p > 2}$)

The earlier proposition implies that ${\frac{1}{2} \in \mathbb Z_3}$ (among other things); your intuition about what is an “integer” is different here! In olympiad terms, we already knew ${\frac{1}{2} \pmod 3}$ made sense, which is why calling ${\frac{1}{2}}$ an “integer” in the ${3}$-adics is correct, even though it doesn’t correspond to any element of ${\mathbb Z}$.

Fun (but trickier) exercise: rational numbers correspond exactly to eventually periodic base ${p}$ expansions.

With this observation, here is now the definition of ${\mathbb Q_p}$.

Definition 8 (Introducing ${\mathbb Q_p}$)

Since ${\mathbb Z_p}$ is an integral domain, we let ${\mathbb Q_p}$ denote its field of fractions. These are the ${p}$-adic numbers.

Continuing our generating functions analogy:

$\displaystyle \mathbb Z_p \text{ is to } \mathbb Q_p \quad\text{as}\quad K[[X]] \text{ is to } K((X)).$

This means ${\mathbb Q_p}$ is “Laurent series with base ${p}$”, and in particular according to the earlier proposition we deduce:

Proposition 9 (${\mathbb Q_p}$ looks like formal Laurent series)

Every nonzero element of ${\mathbb Q_p}$ is uniquely of the form

$\displaystyle p^k u \qquad \text{ where } k \in \mathbb Z, \; u \in \mathbb Z_p^\times.$

Thus, continuing our base ${p}$ analogy, elements of ${\mathbb Q_p}$ are in bijection with “Laurent series”

$\displaystyle \sum_{k \ge -n} a_k p^k = \overline{\dots a_2 a_1 a_0 . a_{-1} a_{-2} \dots a_{-n}}_p$

for ${a_k \in \left\{ 0, \dots, p-1 \right\}}$. So the base ${p}$ representations of elements of ${\mathbb Q_p}$ can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).

(Fair warning: the field ${\mathbb Q_p}$ has characteristic zero, not ${p}$.)

Remark 10 (Warning on fraction field)

This result implies that you shouldn’t think about elements of ${\mathbb Q_p}$ as ${x/y}$ (for ${x,y \in \mathbb Z_p}$) in practice, even though this is the official definition (and what you’d expect from the name ${\mathbb Q_p}$). The only denominators you need are powers of ${p}$.

To keep pushing the formal Laurent series analogy, ${K((X))}$ is usually not thought of as quotient of generating functions but rather as “formal series with some negative exponents”. You should apply the same intuition on ${\mathbb Q_p}$.

(At this point I want to make a remark about the fact ${1/p \in \mathbb Q_p}$, connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo ${p}$, but if you do the quantity ${n/p \bmod{p}}$ doesn’t make sense unless you know ${n \bmod{p^2}}$. You can’t fix this by just taking modulo ${p^2}$ since then you need ${n \bmod{p^3}}$ to get ${n/p \bmod{p^2}}$, ad infinitum. You can work around issues like this, but the nice feature of ${\mathbb Z_p}$ and ${\mathbb Q_p}$ is that you have modulo ${p^e}$ information for “all ${e}$ at once”: the information of ${x \in \mathbb Q_p}$ packages all the modulo ${p^e}$ information simultaneously. So you can divide by ${p}$ with no repercussions.)

3. Analytic perspective

3.1. Definition

Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for ${p}$-adic purposes we only care about modulo ${p^e}$ information. So, we’ll instead think of two elements of ${\mathbb Z_p}$ or ${\mathbb Q_p}$ as “close” if they differ by a large multiple of ${p^e}$.

For this we’ll borrow the familiar ${\nu_p}$ from elementary number theory.

Definition 11 (${p}$-adic valuation and absolute value)

We define the ${p}$-adic valuation ${\nu_p : \mathbb Q_p^\times \rightarrow \mathbb Z}$ in the following two equivalent ways:

• For ${x = (x_1, x_2, \dots) \in \mathbb Z_p}$ we let ${\nu_p(x)}$ be the largest ${e}$ such that ${x_e \equiv 0 \pmod{p^e}}$ (or ${e=0}$ if ${x \in \mathbb Z_p^\times}$). Then extend to all of ${\mathbb Q_p^\times}$ by ${\nu_p(xy) = \nu_p(x) + \nu_p(y)}$.
• Each ${x \in \mathbb Q_p^\times}$ can be written uniquely as ${p^k u}$ for ${u \in \mathbb Z_p^\times}$, ${k \in \mathbb Z}$. We let ${\nu_p(x) = k}$.

By convention we set ${\nu_p(0) = +\infty}$. Finally, define the ${p}$-adic absolute value ${\left\lvert \bullet \right\rvert_p}$ by

$\displaystyle \left\lvert x \right\rvert_p = p^{-\nu_p(x)}.$

In particular ${\left\lvert 0 \right\rvert_p = 0}$.

This fulfills the promise that ${x}$ and ${y}$ are close if they look the same modulo ${p^e}$ for large ${e}$; in that case ${\nu_p(x-y)}$ is large and accordingly ${\left\lvert x-y \right\rvert_p}$ is small.

3.2. Ultrametric space

In this way, ${\mathbb Q_p}$ and ${\mathbb Z_p}$ becomes a metric space with metric given by ${\left\lvert x-y \right\rvert_p}$.

Exercise 12

Suppose ${f \colon \mathbb Z_p \rightarrow \mathbb Q_p}$ is continuous and ${f(n) = (-1)^n}$ for every ${n \in \mathbb Z_{\ge 0}}$. Prove that ${p = 2}$.

In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from ${\mathbb R}$.

Proposition 13 (${\left\lvert \bullet \right\rvert_p}$ is an ultrametric)

For any ${x,y \in \mathbb Z_p}$, we have the strong triangle inequality

$\displaystyle \left\lvert x+y \right\rvert_p \le \max \left\{ \left\lvert x \right\rvert_p, \left\lvert y \right\rvert_p \right\}.$

Equality holds if (but not only if) ${\left\lvert x \right\rvert_p \neq \left\lvert y \right\rvert_p}$.

However, ${\mathbb Q_p}$ is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in ${\mathbb R}$ or ${\mathbb C}$: basically, any notion such as “continuous function”, “convergent series”, et cetera has a ${p}$-adic analog. In particular, we can define what it means for an infinite sum to converge:

Definition 14 (Convergence notions)

Here are some examples of ${p}$-adic analogs of “real-world” notions.

• A sequence ${s_1}$, \dots converges to a limit ${L}$ if ${\lim_{n \rightarrow \infty} \left\lvert s_n - L \right\rvert_p = 0}$.
• The infinite series ${\sum_k x_k}$ converges if the sequence of partial sums ${s_1 = x_1}$, ${s_2 = x_1 + x_2}$, \dots, converges to some limit.
• \dots et cetera \dots

With this definition in place, the “base ${p}$” discussion we had earlier is now true in the analytic sense: if ${x = \overline{\dots a_2 a_1 a_0}_p \in \mathbb Z_p}$ then

$\displaystyle \sum_{k=0}^\infty a_k p^k \quad\text{converges to } x.$

Indeed, the ${n}$th partial sum is divisible by ${p^n}$, hence the partial sums approach ${x}$ as ${n \rightarrow \infty}$.

While the definitions are all the same, there are some changes in properties that should be true. For example, in ${\mathbb Q_p}$ convergence of partial sums is simpler:

Proposition 15 (${|x_k|_p \rightarrow 0}$ iff convergence of series)

A series ${\sum_{k=1}^\infty x_k}$ in ${\mathbb Q_p}$ converges to some limit if and only if ${\lim_{k \rightarrow \infty} |x_k|_p = 0}$.

Contrast this with ${\sum \frac1n = \infty}$ in ${\mathbb R}$. You can think of this as a consequence of strong triangle inequality. Proof: By multiplying by a large enough power of ${p}$, we may assume ${x_k \in \mathbb Z_p}$. (This isn’t actually necessary, but makes the notation nicer.)

Observe that ${x_k \pmod p}$ must eventually stabilize, since for large enough ${n}$ we have ${\left\lvert x_n \right\rvert_p < 1 \iff \nu_p(x_n) \ge 1}$. So let ${a_1}$ be the eventual residue modulo ${p}$ of ${\sum_{k=0}^N x_k \pmod p}$ for large ${N}$. In the same way let ${a_2}$ be the eventual residue modulo ${p^2}$, and so on. Then one can check we approach the limit ${a = (a_1, a_2, \dots)}$. $\Box$

Here’s a couple exercises to get you used to thinking of ${\mathbb Z_p}$ and ${\mathbb Q_p}$ as metric spaces.

Exercise 16 (${\mathbb Z_p}$ is compact)

Show that ${\mathbb Q_p}$ is not compact, but ${\mathbb Z_p}$ is. (For the latter, I recommend using sequential continuity.)

Exercise 17 (Totally disconnected)

Show that both ${\mathbb Z_p}$ and ${\mathbb Q_p}$ are totally disconnected: there are no connected sets other than the empty set and singleton sets.

3.3. More fun with geometric series

While we’re at it, let’s finally state the ${p}$-adic analog of the geometric series formula.

Proposition 18 (Geometric series)

Let ${x \in \mathbb Z_p}$ with ${\left\lvert x \right\rvert_p < 1}$. Then

$\displaystyle \frac{1}{1-x} = 1 + x + x^2 + x^3 + \dots.$

Proof: Note that the partial sums satisfy ${1 + x + x^2 + \dots + x^n = \frac{1-x^n}{1-x}}$, and ${x^n \rightarrow 0}$ as ${n \rightarrow \infty}$ since ${\left\lvert x \right\rvert_p < 1}$. $\Box$

So, ${1 + 3 + 3^2 + \dots = -\frac{1}{2}}$ is really a correct convergence in ${\mathbb Z_3}$. And so on.

If you buy the analogy that ${\mathbb Z_p}$ is generating functions with base ${p}$, then all the olympiad generating functions you might be used to have ${p}$-adic analogs. For example, you can prove more generally that:

Theorem 19 (Generalized binomial theorem)

If ${x \in \mathbb Z_p}$ and ${\left\lvert x \right\rvert_p < 1}$, then for any ${r \in \mathbb Q}$ we have the series convergence

$\displaystyle \sum_{n \ge 0} \binom rn x^n = (1+x)^r.$

(I haven’t defined ${(1+x)^r}$, but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of ${p}$. I won’t elaborate too much on this now, since ${p}$-adic exponentiation will be described in much more detail in the next post.

3.4. Completeness

Note that the definition of ${\left\lvert \bullet \right\rvert_p}$ could have been given for ${\mathbb Q}$ as well; we didn’t need ${\mathbb Q_p}$ to introduce it (after all, we have ${\nu_p}$ in olympiads already). The big important theorem I must state now is:

Theorem 20 (${\mathbb Q_p}$ is complete)

The space ${\mathbb Q_p}$ is the completion of ${\mathbb Q}$ with respect to ${\left\lvert \bullet \right\rvert_p}$.

This is the definition of ${\mathbb Q_p}$ you’ll see more frequently; one then defines ${\mathbb Z_p}$ in terms of ${\mathbb Q_p}$ (rather than vice-versa) according to

$\displaystyle \mathbb Z_p = \left\{ x \in \mathbb Q_p : \left\lvert x \right\rvert_p \le 1 \right\}.$

(Remark for experts: ${\mathbb Q_p}$ is a field with ${\nu_p}$ a non-Arcihmedian valuation; then ${\mathbb Z_p}$ is its valuation ring.)

Let me justify why this definition is philosophically nice.

Suppose you are a numerical analyst and you want to estimate the value of the sum

$\displaystyle S = \frac{1}{1^2} + \frac{1}{2^2} + \dots + \frac{1}{10000^2}$

to within ${0.001}$. The sum ${S}$ consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it really helps if you know about the real numbers: because then you can construct the infinite series ${\sum_{n \ge 1} n^{-2} = \frac16 \pi^2}$, and deduce that ${S \approx \frac{\pi^2}{6}}$, up to some small error term from the terms past ${\frac{1}{10001^2}}$, which can be bounded.

Of course, in order to have access to enough theory to prove that ${S = \pi^2/6}$, you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space ${\mathbb Q}$, where e.g. the sequence ${1}$, ${1.4}$, ${1.41}$, ${1.414}$, \dots is considered “not convergent” because ${\sqrt2 \notin \mathbb Q}$. Instead, all analysis is done in the completion of ${\mathbb Q}$, namely ${\mathbb R}$.

Now suppose you are an olympiad contestant and want to estimate the sum

$\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2}$

to within mod ${p^3}$ (i.e. to within ${p^{-3}}$ in ${\left\lvert \bullet \right\rvert_p}$). Even though ${f_p(x)}$ is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod ${p^3}$). But the space ${\mathbb Q}$ is not complete with respect to ${\left\lvert \bullet \right\rvert_p}$ either, and thus it makes sense to work in the completion of ${\mathbb Q}$ with respect to ${\left\lvert \bullet \right\rvert_p}$. This is exactly ${\mathbb Q_p}$.

4. Solving USA TST 2002/2

Let’s finally solve Example~1, which asks to compute

$\displaystyle f_p(x) = \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} \pmod{p^3}.$

Armed with the generalized binomial theorem, this becomes straightforward.

\displaystyle \begin{aligned} f_p(x) &= \sum_{k=1}^{p-1} \frac{1}{(px+k)^2} = \sum_{k=1}^{p-1} \frac{1}{k^2} \left( 1 + \frac{px}{k} \right)^{-2} \\ &= \sum_{k=1}^{p-1} \frac{1}{k^2} \sum_{n \ge 0} \binom{-2}{n} \left( \frac{px}{k} \right)^{n} \\ &= \sum_{n \ge 0} \binom{-2}{n} \sum_{k=1}^{p-1} \frac{1}{k^2} \left( \frac{x}{k} \right)^{n} p^n \\ &\equiv \sum_{k=1}^{p-1} \frac{1}{k^2} - 2x \left( \sum_{k=1}^{p-1} \frac{1}{k^3} \right) p + 3x^2 \left( \sum_{k=1}^{p-1} \frac{1}{k^4} \right) p^2 \pmod{p^3}. \end{aligned}

Using the elementary facts that ${p^2 \mid \sum_k k^{-3}}$ and ${p \mid \sum k^{-4}}$, this solves the problem.

I’m reading through Primes of the Form ${x^2+ny^2}$, by David Cox (link; it’s good!). Here are the high-level notes I took on the first chapter, which is about the theory of quadratic forms.

(Meta point re blog: I’m probably going to start posting more and more of these more high-level notes/sketches on this blog on topics that I’ve been just learning. Up til now I’ve been mostly only posting things that I understand well and for which I have a very polished exposition. But the perfect is the enemy of the good here; given that I’m taking these notes for my own sake, I may as well share them to help others.)

1. Overview

Definition 1

For us a quadratic form is a polynomial ${Q = Q(x,y) = ax^2 + bxy + cy^2}$, where ${a}$, ${b}$, ${c}$ are some integers. We say that it is primitive if ${\gcd(a,b,c) = 1}$.

For example, we have the famous quadratic form

$\displaystyle Q_{\text{Fermat}}(x,y) = x^2+y^2.$

As readers are probably aware, we can say a lot about exactly which integers can be represented by ${Q_{\text{Fermat}}}$: by Fermat’s Christmas theorem, the primes ${p \equiv 1 \pmod 4}$ (and ${p=2}$) can all be written as the sum of two squares, while the primes ${p \equiv 3 \pmod 4}$ cannot. For convenience, let us say that:

Definition 2

Let ${Q}$ be a quadratic form. We say it represents the integer ${m}$ if there exists ${x,y \in \mathbb Z}$ with ${m = Q(x,y)}$. Moreover, ${Q}$ properly represents ${m}$ if one can find such ${x}$ and ${y}$ which are also relatively prime.

The basic question is: what can we say about which primes/integers are properly represented by a quadratic form? In fact, we will later restrict our attention to “positive definite” forms (described later).

For example, Fermat’s Christmas theorem now rewrites as:

Theorem 3 (Fermat’s Christmas theorem for primes)

An odd prime ${p}$ is (properly) represented by ${Q_{\text{Fermat}}}$ if and only if ${p \equiv 1 \pmod 4}$.

The proof of this is classical, see for example my olympiad handout. We also have the formulation for odd integers:

Theorem 4 (Fermat’s Christmas theorem for odd integers)

An odd integer ${m}$ is properly represented by ${Q_{\text{Fermat}}}$ if and only if all prime factors of ${m}$ are ${1 \pmod 4}$.

Proof: For the “if” direction, we use the fact that ${Q_{\text{Fermat}}}$ is multiplicative in the sense that

$\displaystyle (x^2+y^2)(u^2+v^2) = (xu \pm yv)^2 + (xv \mp yu)^2.$

For the “only if” part we use the fact that if a multiple of a prime ${p}$ is properly represented by ${Q_{\text{Fermat}}}$, then so is ${p}$. This follows by noticing that if ${x^2+y^2 \equiv 0 \pmod p}$ (and ${xy \not\equiv 0 \pmod p}$) then ${(x/y)^2 \equiv -1 \pmod p}$. $\Box$
Tangential remark: the two ideas in the proof will grow up in the following way.

• The fact that ${Q_{\text{Fermat}}}$ “multiplies nicely” will grow up to become the so-called composition of quadratic forms.
• The second fact will not generalize for an arbitrary form ${Q}$. Instead, we will see that if a multiple of ${p}$ is represented by a form ${Q}$ then some form of the same “discriminant” will represent the prime ${p}$, but this form need not be the same as ${Q}$ itself.

2. Equivalence of forms, and the discriminant

The first thing we should do is figure out when two forms are essentially the same: for example, ${x^2+5y^2}$ and ${5x^2+y^2}$ should clearly be considered the same. More generally, if we think of ${Q}$ as acting on ${\mathbb Z^{\oplus 2}}$ and ${T}$ is any automorphism of ${\mathbb Z^{\oplus 2}}$, then ${Q \circ T}$ should be considered the same as ${Q}$. Specifically,

Definition 5

Two forms ${Q_1}$ and ${Q_2}$ said to be equivalent if there exists

$\displaystyle T = \begin{pmatrix} p & q \\ r & s \end{pmatrix} \in \text{GL }(2,\mathbb Z)$

such that ${Q_2(x,y) = Q_1(px+ry, qx+sy)}$. We have ${\det T = ps-qr = \pm 1}$ and so we say the equivalence is

• a proper equivalence if ${\det T = +1}$, and
• an improper equivalence if ${\det T = -1}$.

So we generally will only care about forms up to proper equivalence. (It will be useful to distinguish between proper/improper equivalence later.)

Naturally we seek some invariants under this operation. By far the most important is:

Definition 6

The discriminant of a quadratic form ${Q = ax^2 + bxy + cy^2}$ is defined as

$\displaystyle D = b^2-4ac.$

The discriminant is invariant under equivalence (check this). Note also that we also have ${D \equiv 0 , 1 \pmod 4}$.

Observe that we have

$\displaystyle 4a \cdot (ax^2+bxy+cy^2) = (2ax + by)^2 - Dy^2.$

So if ${D < 0}$ and ${a > 0}$ (thus ${c > 0}$ too) then ${ax^2+bxy+cy^2 > 0}$ for all ${x,y > 0}$. Such quadratic forms are called positive definite, and we will restrict our attention to these forms.

Now that we have this invariant, we may as well classify equivalence classes of quadratic forms for a fixed discriminant. It turns out this can be done explicitly.

Definition 7

A quadratic form ${Q = ax^2 + bxy + cy^2}$ is reduced if

• it is primitive and positive definite,
• ${|b| \le a \le c}$, and
• ${b \ge 0}$ if either ${|b| = a}$ or ${a = c}$.

Exercise 8

Check there only finitely many reduced forms of a fixed discriminant.

Then the big huge theorem is:

Theorem 9 (Reduced forms give a set of representatives)

Every primitive positive definite form ${Q}$ of discriminant is properly equivalent to a unique reduced form. We call this the reduction of ${Q}$.

Proof: Omitted due to length, but completely elementary. It is a reduction argument with some number of cases. $\Box$

Thus, for any discriminant ${D}$ we can consider the set

$\displaystyle \text{Cl}(D) = \left\{ \text{reduced forms of discriminant } D \right\}$

which will be the equivalence classes of positive definite of discriminant ${D}$. By abuse of notation we will also consider it as the set of equivalence classes of primitive positive definite forms of discriminant ${D}$.

We also define ${h(D) = \left\lvert \text{Cl}(D) \right\rvert}$; by the exercise, ${h(D) < \infty}$. This is called the class number.

Moreover, we have ${h(D) \ge 1}$, because we can take ${x^2 - D/4 y^2}$ for ${D \equiv 0 \pmod 4}$ and ${x^2 + xy + (1-D)/4 y^2}$ for ${D \equiv 1 \pmod 4}$. We call this form the principal form.

Example 10 (Examples of quadratic forms with ${h(D) = 1}$, ${D \equiv 0 \pmod 4}$)

The following discriminants have class number ${h(D) = 1}$, hence having only the principal form:

• ${D = -4}$, with form ${x^2 + y^2}$.
• ${D = -8}$, with form ${x^2 + 2y^2}$.
• ${D = -12}$, with form ${x^2+3y^2}$.
• ${D = -16}$, with form ${x^2 + 4y^2}$.
• ${D = -28}$, with form ${x^2 + 7y^2}$.

This is in fact the complete list when ${D \equiv 0 \pmod 4}$.

Example 11 (Examples of quadratic forms with ${h(D) = 1}$, ${D \equiv 1 \pmod 4}$)

The following discriminants have class number ${h(D) = 1}$, hence having only the principal form:

• ${D = -3}$, with form ${x^2 + xy + y^2}$.
• ${D = -7}$, with form ${x^2 + xy + 2y^2}$.
• ${D = -11}$, with form ${x^2 + xy + 3y^2}$.
• ${D = -19}$, with form ${x^2 + xy + 5y^2}$.
• ${D = -27}$, with form ${x^2 + xy + 7y^2}$.
• ${D = -43}$, with form ${x^2 + xy + 11y^2}$.
• ${D = -67}$, with form ${x^2 + xy + 17y^2}$.
• ${D = -163}$, with form ${x^2 + xy + 41y^2}$.

This is in fact the complete list when ${D \equiv 1 \pmod 4}$.

Example 12 (More examples of quadratic forms)

Here are tables for small discriminants with ${h(D) > 1}$. When ${D \equiv 0 \pmod 4}$ we have

• ${D = -20}$, with ${h(D) = 2}$ forms ${2x^2 + 2xy + 3y^2}$ and ${x^2 + 5y^2}$.
• ${D = -24}$, with ${h(D) = 2}$ forms ${2x^2 + 3y^2}$ and ${x^2 + 6y^2}$.
• ${D = -32}$, with ${h(D) = 2}$ forms ${3x^2 + 2xy + 3y^2}$ and ${x^2 + 8y^2}$.
• ${D = -36}$, with ${h(D) = 2}$ forms ${2x^2 + 2xy + 5y^2}$ and ${x^2 + 9y^2}$.
• ${D = -40}$, with ${h(D) = 2}$ forms ${2x^2 + 5y^2}$ and ${x^2 + 10y^2}$.
• ${D = -44}$, with ${h(D) = 3}$ forms ${3x^2 \pm 2xy + 4y^2}$ and ${x^2 + 11y^2}$.

As for ${D \equiv 1 \pmod 4}$ we have

• ${D = -15}$, with ${h(D) = 2}$ forms ${2x^2 + xy + 2y^2}$ and ${x^2 + xy + 4y^2}$.
• ${D = -23}$, with ${h(D) = 3}$ forms ${2x^2 \pm xy + 3y^2}$ and ${x^2+ xy + 6y^2}$.
• ${D = -31}$, with ${h(D) = 3}$ forms ${2x^2 \pm xy + 4}$ and ${x^2 + xy + 8y^2}$.
• ${D = -39}$, with ${h(D) = 4}$ forms ${3x^2 + 3xy + 4y^2}$, ${2x^2 \pm 2xy + 5y^2}$ and ${x^2 + xy + 10y^2}$.

Example 13 (Even More Examples of quadratic forms)

Here are some more selected examples:

• ${D = -56}$ has ${h(D) = 4}$ forms ${x^2+14y^2}$, ${2x^2+7y^2}$ and ${3x^2 \pm 2xy + 5y^2}$.
• ${D = -108}$ has ${h(D) = 3}$ forms ${x^2+27y^2}$ and ${4x^2 \pm 2xy + 7y^2}$.
• ${D = -256}$ has ${h(D) = 4}$ forms ${x^2+64y^2}$, ${4x^2+4xy+17y^2}$ and ${5x^2\pm2xy+13y^2}$.

4. The Character ${\chi_D}$

We can now connect this to primes ${p}$ as follows. Earlier we played with ${Q_{\text{Fermat}} = x^2+y^2}$, and observed that for odd primes ${p}$, ${p \equiv 1 \pmod 4}$ if and only if some multiple of ${p}$ is properly represented by ${Q_{\text{Fermat}}}$.

Our generalization is as follows:

Theorem 14 (Primes represented by some quadratic form)

Let ${D < 0}$ be a discriminant, and let ${p \nmid D}$ be an odd prime. Then the following are equivalent:

• ${\left( \frac Dp \right) = 1}$, i.e. ${D}$ is a quadratic residue modulo ${p}$.
• The prime ${p}$ is (properly) represented by some reduced quadratic form in ${\text{Cl}(D)}$.

This generalizes our result for ${Q_{\text{Fermat}}}$, but note that it uses ${h(-4) = 1}$ in an essential way! That is: if ${(-1/p) = 1}$, we know ${p}$ is represented by some quadratic form of discriminant ${D = -4}$\dots but only since ${h(-4) = 1}$ do we know that this form reduces to ${Q_{\text{Fermat}} = x^2+y^2}$.

Proof: First assume WLOG that ${p \nmid 4a}$ and ${Q(x,y) \equiv 0 \pmod p}$. Thus ${p \nmid y}$, since otherwise this would imply ${x \equiv y \equiv 0 \pmod p}$. Then

$\displaystyle 0 \equiv 4a \cdot Q(x,y) \equiv (2ax + by)^2 - Dy^2 \pmod p$

hence ${D \equiv \left( 2axy^{-1} + b \right)^2 \pmod p}$.

The converse direction is amusing: let ${m^2 = D + pk}$ for integers ${m}$, ${k}$. Consider the quadratic form

$\displaystyle Q(x,y) = px^2 + mxy + ky^2.$

It is primitive of discriminant ${D}$ and ${Q(1,0) = p}$. Now ${Q}$ may not be reduced, but that’s fine: just take the reduction of ${Q}$, which must also properly represent ${p}$. $\Box$

Thus to every discriminant ${D < 0}$ we can attach the Legendre character (is that the name?), which is a homomorphism

$\displaystyle \chi_D = \left( \tfrac{D}{\bullet} \right) : \left( \mathbb Z / D\mathbb Z \right)^\times \rightarrow \{ \pm 1 \}$

with the property that if ${p}$ is a rational prime not dividing ${D}$, then ${\chi_D(p) = \left( \frac{D}{p} \right)}$. This is abuse of notation since I should technically write ${\chi_D(p \pmod D)}$, but there is no harm done: one can check by quadratic reciprocity that if ${p \equiv q \pmod D}$ then ${\chi_D(p) = \chi_D(q)}$. Thus our previous result becomes:

Theorem 15 (${\ker(\chi_D)}$ consists of representable primes)

Let ${p \nmid D}$ be prime. Then ${p \in \ker(\chi_D)}$ if and only if some quadratic form in ${\text{Cl}(D)}$ represents ${p}$.

As a corollary of this, using the fact that ${h(-8) = h(-12) = h(-28) = 1}$ one can prove that

Corollary 16 (Fermat-type results for ${h(-4n) = 1}$)

Let ${p > 7}$ be a prime. Then ${p}$ is

• of the form ${x^2 + 2y^2}$ if and only if ${p \equiv 1, 3 \pmod 8}$.
• of the form ${x^2 + 3y^2}$ if and only if ${p \equiv 1 \pmod 3}$.
• of the form ${x^2 + 7y^2}$ if and only if ${p \equiv 1, 2, 4 \pmod 7}$.

Proof: The congruence conditions are equivalent to ${(-4n/p) = 1}$, and as before the only point is that the only reduced quadratic form for these ${D = -4n}$ is the principal one. $\Box$

5. Genus theory

What if ${h(D) > 1}$? Sometimes, we can still figure out which primes go where just by taking mods.

Let ${Q \in \text{Cl}(D)}$. Then it represents some residue classes of ${(\mathbb Z/D\mathbb Z)^\times}$. In that case we call the set of residue classes represented the genus of the quadratic form ${Q}$.

Example 17 (Genus theory of ${D = -20}$)

Consider ${D = -20}$, with

$\displaystyle \ker(\chi_D) = \left\{ 1, 3, 7, 9 \right\} \subseteq (\mathbb Z/D\mathbb Z)^\times.$

We consider the two elements of ${\text{Cl}(D)}$:

• ${x^2 + 5y^2}$ represents ${1, 9 \in (\mathbb Z/20\mathbb Z)^\times}$.
• ${2x^2+2xy+3y^2}$ represents ${3, 7 \in (\mathbb Z/20\mathbb Z)^\times}$.

Now suppose for example that ${p \equiv 9 \pmod{20}}$. It must be represented by one of these two quadratic forms, but the latter form is never ${9 \pmod{20}}$ and so it must be the first one. Thus we conclude that

• ${p = x^2+5y^2}$ if and only if ${p \equiv 1, 9 \pmod{20}}$.
• ${p = 2x^2 + 2xy + 3y^2}$ if and only if ${p \equiv 3, 7 \pmod{20}}$.

The thing that makes this work is that each genus appears exactly once. We are not always so lucky: for example when ${D = -108}$ we have that

Example 18 (Genus theory of ${D = -108}$)

The two elements of ${\text{Cl}(-108)}$ are:

• ${x^2+27y^2}$, which represents exactly the ${1 \pmod 3}$ elements of ${(\mathbb Z/D\mathbb Z)^\times}$.
• ${4x^2 \pm 2xy + 7y^2}$, which also represents exactly the ${1 \pmod 3}$ elements of ${(\mathbb Z/D\mathbb Z)^\times}$.

So the best we can conclude is that ${p = x^2+27y^2}$ OR ${p = 4x^2\pm2xy+7y^2}$ if and only if ${p \equiv 1 \pmod 3}$ This is because the two distinct quadratic forms of discriminant ${-108}$ happen to have the same genus.

We now prove that:

Theorem 19 (Genii are cosets of ${\ker(\chi_D)}$)

Let ${D}$ be a discriminant and consider the Legendre character ${\chi_D}$.

• The genus of the principal form of discriminant ${D}$ constitutes a subgroup ${H}$ of ${\ker(\chi_D)}$, which we call the principal genus.
• Any genus of a quadratic form in ${\text{Cl}(D)}$ is a coset of the principal genus ${H}$ in ${\ker(\chi_D)}$.

Proof: For the first part, we aim to show ${H}$ is multiplicatively closed. For ${D \equiv 0 \pmod 4}$, ${D = -4n}$ we use the fact that

$\displaystyle (x^2+ny^2)(u^2+nv^2) = (xu \pm nyv)^2 + n(xv \mp yu)^2.$

For ${D \equiv 1 \pmod 4}$, we instead appeal to another “magic” identity

$\displaystyle 4\left( x^2+xy+\frac{1-D}{4}y^2 \right) \equiv (2x+y)^2 \pmod D$

and it follows from here that ${H}$ is actually the set of squares in ${(\mathbb Z/D\mathbb Z)^\times}$, which is obviously a subgroup.

Now we show that other quadratic forms have genus equal to a coset of the principal genus. For ${D \equiv 0 \pmod 4}$, with ${D = -4n}$ we can write

$\displaystyle a(ax^2+bxy+cy^2) = (ax+b/2 y)^2 + ny^2$

and thus the desired coset is shown to be ${a^{-1} H}$. As for ${D \equiv 1 \pmod 4}$, we have

$\displaystyle 4a \cdot (ax^2+bxy+cy^2) = (2ax + by)^2 - Dy^2 \equiv (2ax+by)^2 \pmod D$

so the desired coset is also ${a^{-1} H}$, since ${H}$ was the set of squares. $\Box$

Thus every genus is a coset of ${H}$ in ${\ker(\chi_D)}$. Thus:

Definition 20

We define the quotient group

$\displaystyle \text{Gen}(D) = \ker(\chi_D) / H$

which is the set of all genuses in discriminant ${D}$. One can view this as an abelian group by coset multiplication.

Thus there is a natural map

$\displaystyle \Phi_D : \text{Cl}(D) \twoheadrightarrow \text{Gen}(D).$

(The map is surjective by Theorem~14.) We also remark than ${\text{Gen}(D)}$ is quite well-behaved:

Proposition 21 (Structure of ${\text{Gen}(D)}$)

The group ${\text{Gen}(D)}$ is isomorphic to ${(\mathbb Z/2\mathbb Z)^{\oplus m}}$ for some integer ${m}$.

Proof: Observe that ${H}$ contains all the squares of ${\ker(\chi_D)}$: if ${f}$ is the principal form then ${f(t,0) = t^2}$. Thus claim each element of ${\text{Gen}(D)}$ has order at most ${2}$, which implies the result since ${\text{Gen}(D)}$ is a finite abelian group. $\Box$

In fact, one can compute the order of ${\text{Gen}(D)}$ exactly, but for this post I Will just state the result.

Theorem 22 (Order of ${\text{Gen}(D)}$)

Let ${D < 0}$ be a discriminant, and let ${r}$ be the number of distinct odd primes which divide ${D}$. Define ${\mu}$ by:

• ${\mu = r}$ if ${D \equiv 1 \pmod 4}$.
• ${\mu = r}$ if ${D = -4n}$ and ${n \equiv 3 \pmod 4}$.
• ${\mu = r+1}$ if ${D = -4n}$ and ${n \equiv 1,2 \pmod 4}$.
• ${\mu = r+1}$ if ${D = -4n}$ and ${n \equiv 4 \pmod 8}$.
• ${\mu = r+2}$ if ${D = -4n}$ and ${n \equiv 0 \pmod 8}$.

Then ${\left\lvert \text{Gen}(D) \right\rvert = 2^{\mu-1}}$.

6. Composition

We have already used once the nice identity

$\displaystyle (x^2+ny^2)(u^2+nv^2) = (xu \pm nyv)^2 + n(xv \mp yu)^2.$

We are going to try and generalize this for any two quadratic forms in ${\text{Cl}(D)}$. Specifically,

Proposition 23 (Composition defines a group operation)

Let ${f,g \in \text{Cl}(D)}$. Then there is a unique ${h \in \text{Cl}(D)}$ and bilinear forms ${B_i(x,y,z,w) = a_ixz + b_ixw + c_iyz + d_iyw}$ for ${i=1,2}$ such that

• ${f(x,y) g(z,w) = h(B_1(x,y,z,w), B_2(x,y,z,w))}$.
• ${a_1b_2 - a_2b_1 = +f(1,0)}$.
• ${a_1c_2 - a_2c_1 = +g(1,0)}$.

In fact, without the latter two constraints we would instead have ${a_1b_2 - a_2b_1 = \pm f(1,0)}$ and ${a_1c_2 - a_2c_1 = \pm g(1,0)}$, and each choice of signs would yield one of four (possibly different) forms. So requiring both signs to be positive makes this operation well-defined. (This is why we like proper equivalence; it gives us a well-defined group structure, whereas with improper equivalence it would be impossible to put a group structure on the forms above.)

Taking this for granted, we then have that

Theorem 24 (Form class group)

Let ${D \equiv 0, 1 \pmod 4}$, ${D < 0}$ be a discriminant. Then ${\text{Cl}(D)}$ becomes an abelian group under composition, where

• The identity of ${\text{Cl}(D)}$ is the principal form, and
• The inverse of the form ${ax^2+bxy+cy^2}$ is ${ax^2-bxy+cy^2}$.

This group is called the form class group.

We then have a group homomorphism

$\displaystyle \Phi_D : \text{Cl}(D) \twoheadrightarrow \text{Gen}(D).$

Observe that ${ax^2 + bxy + cy^2}$ and ${ax^2 - bxy + cy^2}$ are inverses and that their ${\Phi_D}$ images coincide (being improperly equivalent); this is expressed in the fact that ${\text{Gen}(D)}$ has elements of order ${\le 2}$. As another corollary, the number of elements of ${\text{Cl}(D)}$ with a given genus is always a power of two.

We now define:

Definition 25

An integer ${n \ge 1}$ is convenient if the following equivalent conditions hold:

• The principal form ${x^2+ny^2}$ is the only reduced form with the principal genus.
• ${\Phi_D}$ is injective (hence an isomorphism).
• ${\left\lvert h(D) \right\rvert = 2^{\mu-1}}$.

Thus we arrive at the following corollary:

Corollary 26 (Convenient numbers have nice representations)

Let ${n \ge 1}$ be convenient. Then ${p}$ is of the form ${x^2+ny^2}$ if and only if ${p}$ lies in the principal genus.

Hence the represent-ability depends only on ${p \pmod{4n}}$.

OEIS A000926 lists 65 convenient numbers. This sequence is known to be complete except for at most one more number; moreover the list is complete assuming the Grand Riemann Hypothesis.

7. Cubic and quartic reciprocity

To treat the cases where ${n}$ is not convenient, the correct thing to do is develop class field theory. However, we can still make a little bit more progress if we bring higher reciprocity theorems to bear: we’ll handle the cases ${n=27}$ and ${n=64}$, two examples of numbers which are not convenient.

7.1. Cubic reciprocity

First, we prove that

Theorem 27 (On ${p = x^2+27y^2}$)

A prime ${p > 3}$ is of the form ${x^2+27y^2}$ if and only if ${p \equiv 1 \pmod 3}$ and ${2}$ is a cubic residue modulo ${p}$.

To do this we use cubic reciprocity, which requires working in the Eisenstein integers ${\mathbb Z[\omega]}$ where ${\omega}$ is a cube root of unity. There are six units in ${\mathbb Z[\omega]}$ (the sixth roots of unity), hence each nonzero number has six associates (differing by a unit), and the ring is in fact a PID.

Now if we let ${\pi}$ be a prime not dividing ${3}$, and ${\alpha}$ is coprime to ${\pi}$, then we can define the cubic Legendre symbol by setting

$\displaystyle \left( \frac{\alpha}{\pi} \right)_3 \equiv \alpha^{\frac13(N\pi-1)} \pmod \pi \in \left\{ 1, \omega, \omega^2 \right\}.$

Moreover, we can define a primary prime ${\pi \nmid 3}$ to be one such that ${\pi \equiv -1 \pmod 3}$; given any prime exactly one of the six associates is primary. We then have the following reciprocity theorem:

Theorem 28 (Cubic reciprocity)

If ${\pi}$ and ${\theta}$ are disjoint primary primes in ${\mathbb Z[\omega]}$ then

$\displaystyle \left( \frac{\pi}{\theta} \right)_3 = \left( \frac{\theta}{\pi} \right)_3.$

We also have the following supplementary laws: if ${\pi = (3m-1) + 3n\omega}$, then

$\displaystyle \left( \frac{\omega}{\pi} \right)_3 = \omega^{m+n} \qquad\text{and}\qquad \left( \frac{1-\omega}{\pi} \right)_3 = \omega^{2m}.$

The first supplementary law is for the unit (analogous to ${(-1/p)}$) while the second reciprocity law handles the prime divisors of ${3 = -\omega^2(1-\omega)^2}$ (analogous to ${(2/p)}$.)

We can tie this back into ${\mathbb Z}$ as follows. If ${p \equiv 1 \pmod 3}$ is a rational prime then it is represented by ${x^2+xy+y^2}$, and thus we can put ${p = \pi \overline{\pi}}$ for some prime ${\pi}$, ${N(\pi) = p}$. Consequently, we have a natural isomorphism

$\displaystyle \mathbb Z[\omega] / \pi \mathbb Z[\omega] \cong \mathbb Z / p \mathbb Z.$

Therefore, we see that a given ${a \in (\mathbb Z/p\mathbb Z)^\times}$ is a cubic residue if and only if ${(\alpha/\pi)_3 = 1}$.

In particular, we have the following corollary, which is all we will need:

Corollary 29 (When ${2}$ is a cubic residue)

Let ${p \equiv 1 \pmod 3}$ be a rational prime, ${p > 3}$. Write ${p = \pi \overline{\pi}}$ with ${\pi}$ primary. Then ${2}$ is a cubic residue modulo ${p}$ if and only if ${\pi \equiv 1 \pmod 2}$.

Proof: By cubic reciprocity:

$\displaystyle \left( \frac{2}{\pi} \right)_3 = \left( \frac{\pi}{2} \right)_3 \equiv \pi^{\frac13(N2-1)} \equiv \pi \pmod 2.$

$\Box$

Now we give the proof of Theorem~27. Proof: First assume

$\displaystyle p = x^2+27y^2 = \left( x+3\sqrt 3 y \right)\left( x-3\sqrt 3 y \right).$

Let ${\pi = x + 3 \sqrt{-3} y = (x+3y) + 6y\omega}$ be primary, noting that ${\pi \equiv 1 \pmod 2}$. Now clearly ${p \equiv 1 \pmod 3}$, so done by corollary.

For the converse, assume ${p \equiv 1 \pmod 3}$, ${p = \pi \overline{\pi}}$ with ${\pi}$ primary and ${\pi \equiv 1 \pmod 2}$. If we set ${\pi = a + b\omega}$ for integers ${a}$ and ${b}$, then the fact that ${\pi \equiv 1 \pmod 2}$ and ${\pi \equiv -1 \pmod 3}$ is enough to imply that ${6 \mid b}$ (check it!). Moreover,

$\displaystyle p = a^2-ab+b^2 = \left( a - \frac{1}{2} b \right)^2 + 27 \left( \frac16b \right)^2$

as desired. $\Box$

7.2. Quartic reciprocity

This time we work in ${\mathbb Z[i]}$, for which there are four units ${\pm 1}$, ${\pm i}$. A prime is primary if ${\pi \equiv 1 \pmod{2+2i}}$; every prime not dividing ${2 = -i(1+i)^2}$ has a unique associate which is primary. Then we can as before define

$\displaystyle \alpha^{\frac14(N\pi-1)} \equiv \left( \frac{\alpha}{\pi} \right)_4 \pmod{\pi} \in \left\{ \pm 1, \pm i \right\}$

where ${\pi}$ is primary, and ${\alpha}$ is nonzero mod ${\pi}$. As before ${p \equiv 1 \pmod 4}$, ${p = \pi\overline{\pi}}$ we have that ${a}$ is a quartic residue modulo ${p}$ if and only if ${\left( a/\pi \right)_4 = 1}$ thanks to the isomorphism

$\displaystyle \mathbb Z[i] / \pi \mathbb Z[i] \cong \mathbb Z / p \mathbb Z.$

Now we have

Theorem 30 (Quartic reciprocity)

If ${\pi}$ and ${\theta}$ are distinct primary primes in ${\mathbb Z[i]}$ then

$\displaystyle \left( \frac{\theta}{\pi} \right)_4 = \left( \frac{\pi}{\theta} \right)_4 (-1)^{\frac{1}{16}(N\theta-1)(N\pi-1)}.$

We also have supplementary laws that state that if ${\pi = a+bi}$ is primary, then

$\displaystyle \left( \frac{i}{\pi} \right)_4 = i^{-\frac{1}{2}(a-1)} \qquad\text{and}\qquad \left( \frac{1+i}{\pi} \right)_4 = i^{\frac14(a-b-b^2-1)}.$

Again, the first law handles units, and the second law handles the prime divisors of ${2}$. The corollary we care about this time in fact uses only the supplemental laws:

Corollary 31 (When ${2}$ is a quartic residue)

Let ${p \equiv 1 \pmod 4}$ be a prime, and put ${p = \pi\overline{\pi}}$ with ${\pi = a+bi}$ primary. Then

$\displaystyle \left( \frac{2}{\pi} \right)_4 = i^{-b/2}$

and in particular ${2}$ is a quartic residue modulo ${p}$ if and only if ${b \equiv 0 \pmod 8}$.

Proof: Note that ${2 = i^3(1+i)^2}$ and applying the above. Therefore

$\displaystyle \left( \frac{2}{\pi} \right)_4 = \left( \frac{i}{\pi} \right)_4^3 \left( \frac{1+i}{\pi} \right)_4^2 = i^{-\frac32(a-1)} \cdot i^{\frac12(a-b-b^2-1)} = i^{-(a-1) - \frac{1}{2} b(b+1)}.$

Now we assumed ${a+bi}$ is primary. We claim that

$\displaystyle a - 1 + \frac{1}{2} b^2 \equiv 0 \pmod 4.$

Note that since ${(a+bi)-1}$ was is divisible by ${2+2i}$, hence ${N(2+2i)=8}$ divides ${(a-1)^2+b^2}$. Thus

$\displaystyle 2(a-1) + b^2 \equiv 2(a-1) + (a-1)^2 \equiv (a-1)(a-3) \equiv 0 \pmod 8$

since ${a}$ is odd and ${b}$ is even. Finally,

$\displaystyle \left( \frac{2}{\pi} \right)_4 = i^{-(a-1) - \frac{1}{2} b(b+1)} = i^{-\frac{1}{2} b + (a-1+\frac{1}{2} b^2)} \equiv i^{-\frac{1}{2} b} \pmod p.$

$\Box$

From here we quickly deduce

Theorem 32 (On ${p = x^2+64y^2}$)

If ${p > 2}$ is prime, then ${p = x^2+64y^2}$ if and only if ${p \equiv 1 \pmod 4}$ and ${2}$ is a quartic residue modulo ${p}$.

Vinogradov’s Three-Prime Theorem (with Sammy Luo and Ryan Alweiss)

This was my final paper for 18.099, seminar in discrete analysis, jointly with Sammy Luo and Ryan Alweiss.

We prove that every sufficiently large odd integer can be written as the sum of three primes, conditioned on a strong form of the prime number theorem.

1. Introduction

In this paper, we prove the following result:

Every sufficiently large odd integer ${N}$ is the sum of three prime numbers.

In fact, the following result is also true, called the “weak Goldbach conjecture”.

Theorem 2 (Weak Goldbach conjecture)

Every odd integer ${N \ge 7}$ is the sum of three prime numbers.

The proof of Vinogradov’s theorem becomes significantly simpler if one assumes the generalized Riemann hypothesis; this allows one to use a strong form of the prime number theorem (Theorem 9). This conditional proof was given by Hardy and Littlewood in the 1923’s. In 1997, Deshouillers, Effinger, te Riele and Zinoviev showed that the generalized Riemann hypothesis in fact also implies the weak Goldbach conjecture by improving the bound to ${10^{20}}$ and then exhausting the remaining cases via a computer search.

As for unconditional proofs, Vinogradov was able to eliminate the dependency on the generalized Riemann hypothesis in 1937, which is why the Theorem 1 bears his name. However, Vinogradov’s bound used the ineffective Siegel-Walfisz theorem; his student K. Borozdin showed that ${3^{3^{15}}}$ is large enough. Over the years the bound was improved, until recently in 2013 when Harald Helfgott claimed the first unconditional proof of Theorem 2, see here.

In this exposition we follow Hardy and Littlewood’s approach, i.e. we prove Theorem 1 assuming the generalized Riemann hypothesis, following the exposition of Rhee. An exposition of the unconditional proof by Vinogradov is given by Rouse.

2. Synopsis

We are going to prove that

$\displaystyle \sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) \asymp \frac12 N^2 \mathfrak G(N) \ \ \ \ \ (1)$

where

$\displaystyle \mathfrak G(N) \overset{\text{def}}{=} \prod_{p \mid N} \left( 1 - \frac{1}{(p-1)^2} \right) \prod_{p \nmid N} \left( 1 + \frac{1}{(p-1)^3} \right)$

and ${\Lambda}$ is the von Mangoldt function defined as usual. Then so long as ${2 \nmid N}$, the quantity ${\mathfrak G(N)}$ will be bounded away from zero; thus (1) will imply that in fact there are many ways to write ${N}$ as the sum of three distinct prime numbers.

The sum (1) is estimated using Fourier analysis. Let us define the following.

Definition 3

Let ${\mathbb T = \mathbb R/\mathbb Z}$ denote the circle group, and let ${e : \mathbb T \rightarrow \mathbb C}$ be the exponential function ${\theta \mapsto \exp(2\pi i \theta)}$. For ${\alpha\in\mathbb T}$, ${\|\alpha\|}$ denotes the minimal distance from ${\alpha}$ to an integer.

Note that ${|e(\theta)-1|=\Theta(\|\theta\|)}$.

Definition 4

For ${\alpha \in \mathbb T}$ and ${x > 0}$ we define

$\displaystyle S(x, \alpha) = \sum_{n \le x} \Lambda(n) e(n\alpha).$

Then we can rewrite (1) using ${S}$ as a “Fourier coefficient”:

Proof: We have

$\displaystyle S(N,\alpha)^3=\sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha),$

so

\displaystyle \begin{aligned} \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha &= \int_{\alpha \in \mathbb T} \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha) e(-N\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)\int_{\alpha \in \mathbb T}e((a+b+c-N)\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)I(a+b+c=N) \\ &= \sum_{a+b+c=N}\Lambda(a)\Lambda(b)\Lambda(c), \end{aligned}

as claimed. $\Box$

In order to estimate the integral in Proposition 5, we divide ${\mathbb T}$ into the so-called “major” and “minor” arcs. Roughly,

• The “major arcs” are subintervals of ${\mathbb T}$ centered at a rational number with small denominator.
• The “minor arcs” are the remaining intervals.

These will be made more precise later. This general method is called the Hardy-Littlewood circle method, because of the integral over the circle group ${\mathbb T}$.

The rest of the paper is structured as follows. In Section 3, we define the Dirichlet character and other number-theoretic objects, and state some estimates for the partial sums of these objects conditioned on the Riemann hypothesis. These bounds are then used in Section 4 to provide corresponding estimates on ${S(x, \alpha)}$. In Section 5 we then define the major and minor arcs rigorously and use the previous estimates to given an upper bound for the integral over both areas. Finally, we complete the proof in Section 6.

3. Prime number theorem type bounds

In this section, we collect the necessary number-theoretic results that we will need. It is in this section only that we will require the generalized Riemann hypothesis.

As a reminder, the notation ${f(x)\ll g(x)}$, where ${f}$ is a complex function and ${g}$ a nonnegative real one, means ${f(x)=O(g(x))}$, a statement about the magnitude of ${f}$. Likewise, ${f(x)=g(x)+O(h(x))}$ simply means that for some ${C}$, ${|f(x)-g(x)|\leq C|h(x)|}$ for all sufficiently large ${x}$.

3.1. Dirichlet characters

In what follows, ${q}$ denotes a positive integer.

Definition 6

A Dirichlet character modulo ${q}$ ${\chi}$ is a homomorphism ${\chi : (\mathbb Z/q)^\times \rightarrow \mathbb C^\times}$. It is said to be trivial if ${\chi = 1}$; we denote this character by ${\chi_0}$.

By slight abuse of notation, we will also consider ${\chi}$ as a function ${\mathbb Z \rightarrow \mathbb C^\ast}$ by setting ${\chi(n) = \chi(n \pmod q)}$ for ${\gcd(n,q) = 1}$ and ${\chi(n) = 0}$ for ${\gcd(n,q) > 1}$.

Remark 7

The Dirichlet characters form a multiplicative group of order ${\phi(q)}$ under multiplication, with inverse given by complex conjugation. Note that ${\chi(m)}$ is a primitive ${\phi(q)}$th root of unity for any ${m \in (\mathbb Z/q)^\times}$, thus ${\chi}$ takes values in the unit circle.

Moreover, the Dirichlet characters satisfy an orthogonality relation

Experts may recognize that the Dirichlet characters are just the elements of the Pontryagin dual of ${(\mathbb Z/q)^\times}$. In particular, they satisfy an orthogonality relationship

$\displaystyle \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \chi(n) \overline{\chi(a)} = \begin{cases} 1 & n = a \pmod q \\ 0 & \text{otherwise} \end{cases} \ \ \ \ \ (3)$

and thus form an orthonormal basis for functions ${(\mathbb Z/q)^\times \rightarrow \mathbb C}$.

3.2. Prime number theorem for arithmetic progressions

Definition 8

The generalized Chebyshev function is defined by

$\displaystyle \psi(x, \chi) = \sum_{n \le x} \Lambda(n) \chi(n).$

The Chebyshev function is studied extensively in analytic number theory, as it is the most convenient way to phrase the major results of analytic number theory. For example, the prime number theorem is equivalent to the assertion that

$\displaystyle \psi(x, \chi_0) = \sum_{n \le x} \Lambda(n) \asymp x$

where ${q = 1}$ (thus ${\chi_0}$ is the constant function ${1}$). Similarly, Dirichlet’s theorem actually asserts that any ${q \ge 1}$,

$\displaystyle \psi(x, \chi) = \begin{cases} x + o_q(x) & \chi = \chi_0 \text{ trivial} \\ o_q(x) & \chi \neq \chi_0 \text{ nontrivial}. \end{cases}$

However, the error term in these estimates is quite poor (more than ${x^{1-\varepsilon}}$ for every ${\varepsilon}$). However, by assuming the Riemann Hypothesis for a certain “${L}$-function” attached to ${\chi}$, we can improve the error terms substantially.

Theorem 9 (Prime number theorem for arithmetic progressions)

Let ${\chi}$ be a Dirichlet character modulo ${q}$, and assume the Riemann hypothesis for the ${L}$-function attached to ${\chi}$.

1. If ${\chi}$ is nontrivial, then

$\displaystyle \psi(x, \chi) \ll \sqrt{x} (\log qx)^2.$

2. If ${\chi = \chi_0}$ is trivial, then

$\displaystyle \psi(x, \chi_0) = x + O\left( \sqrt x (\log x)^2 + \log q \log x \right).$

Theorem 9 is the strong estimate that we will require when putting good estimates on ${S(x, \alpha)}$, and is the only place in which the generalized Riemann Hypothesis is actually required.

3.3. Gauss sums

Definition 10

For ${\chi}$ a Dirichlet character modulo ${q}$, the Gauss sum ${\tau(\chi)}$ is defined by

$\displaystyle \tau(\chi)=\sum_{a=0}^{q-1}\chi(a)e(a/q).$

We will need the following fact about Gauss sums.

Lemma 11

Consider Dirichlet characters modulo ${q}$. Then:

1. We have ${\tau(\chi_0) = \mu(q)}$.
2. For any ${\chi}$ modulo ${q}$, ${\left\lvert \tau(\chi) \right\rvert \le \sqrt q}$.

3.4. Dirichlet approximation

We finally require Dirichlet approximation theorem in the following form.

Theorem 12 (Dirichlet approximation)

Let ${\alpha \in \mathbb R}$ be arbitrary, and ${M}$ a fixed integer. Then there exists integers ${a}$ and ${q = q(\alpha)}$, with ${1 \le q \le M}$ and ${\gcd(a,q) = 1}$, satisfying

$\displaystyle \left\lvert \alpha - \frac aq \right\rvert \le \frac{1}{qM}.$

4. Bounds on ${S(x, \alpha)}$

In this section, we use our number-theoretic results to bound ${S(x,\alpha)}$.

First, we provide a bound for ${S(x,\alpha)}$ if ${\alpha}$ is a rational number with “small” denominator ${q}$.

Lemma 13

Let ${\gcd(a,q) = 1}$. Assuming Theorem 9, we have

$\displaystyle S(x, a/q) = \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right)$

where ${\mu}$ denotes the Möbius function.

Proof: Write the sum as

$\displaystyle S(x, a/q) = \sum_{n \le x} \Lambda(n) e(na/q).$

First we claim that the terms ${\gcd(n,q) > 1}$ (and ${\Lambda(n) \neq 0}$) contribute a negligibly small ${\ll \log q \log x}$. To see this, note that

• The number ${q}$ has ${\ll \log q}$ distinct prime factors, and
• If ${p^e \mid q}$, then ${\Lambda(p) + \dots + \Lambda(p^e) = e\log p = \log(p^e) < \log x}$.

So consider only terms with ${\gcd(n,q) = 1}$. To bound the sum, notice that

\displaystyle \begin{aligned} e(n \cdot a/q) &= \sum_{b \text{ mod } q} e(b/q) \cdot \mathbf 1(b \equiv an) \\ &= \sum_{b \text{ mod } q} e(b/q) \left( \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \chi(b) \overline{\chi(an)} \right) \end{aligned}

by the orthogonality relations. Now we swap the order of summation to obtain a Gauss sum:

\displaystyle \begin{aligned} e(n \cdot a/q) &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} \left( \sum_{b \text{ mod } q} \chi(b) e(b/q) \right) \\ &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} \tau(\chi). \end{aligned}

Thus, we swap the order of summation to obtain that

\displaystyle \begin{aligned} S(x, \alpha) &= \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n) e(n \cdot a/q) \\ &= \frac{1}{\phi(q)} \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \sum_{\chi \text{ mod } q} \Lambda(n) \overline{\chi(an)} \tau(\chi) \\ &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \tau(\chi) \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n) \overline{\chi(an)} \\ &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n)\overline{\chi(n)} \\ &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) \psi(x, \overline\chi) \\ &= \frac{1}{\phi(q)} \left( \tau(\chi_0) \psi(x, \chi_0) + \sum_{1 \neq \chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) \psi(x, \overline\chi) \right). \end{aligned}

Now applying both parts of Lemma 11 in conjunction with Theorem 9 gives

\displaystyle \begin{aligned} S(x,\alpha) &= \frac{\mu(q)}{\phi(q)} \left( x + O\left( \sqrt x (\log qx)^2 \right) \right) + O\left( \sqrt x (\log x)^2 \right) \\ &= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right) \end{aligned}

as desired. $\Box$

We then provide a bound when ${\alpha}$ is “close to” such an ${a/q}$.

Lemma 14

Let ${\gcd(a,q) = 1}$ and ${\beta \in \mathbb T}$. Assuming Theorem 9, we have

$\displaystyle S(x, a/q + \beta) = \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(\beta n) \right) + O\left( (1+\|\beta\|x) \sqrt{qx} (\log qx)^2 \right).$

Proof: For convenience let us assume ${x \in \mathbb Z}$. Let ${\alpha = a/q + \beta}$. Let us denote ${\text{Err}(x, \alpha) = S(x,\alpha) - \frac{\mu(q)}{\phi(q)} x}$, so by Lemma 13 we have ${\text{Err}(x,\alpha) \ll \sqrt{qx}(\log x)^2}$. We have

\displaystyle \begin{aligned} S(x, \alpha) &= \sum_{n \le x} \Lambda(n) e(na/q) e(n\beta) \\ &= \sum_{n \le x} e(n\beta) \left( S(n, a/q) - S(n-1, a/q) \right) \\ &= \sum_{n \le x} e(n\beta) \left( \frac{\mu(q)}{\phi(q)} + \text{Err}(n, \alpha) - \text{Err}(n-1, \alpha) \right) \\ &= \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) + \sum_{1 \le m \le x-1} \left( e( (m+1)\beta) - e( m\beta ) \right) \text{Err}(m, \alpha) \\ &\qquad + e(x\beta) \text{Err}(x, \alpha) - e(0) \text{Err}(0, \alpha) \\ &\le \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) + \left( \sum_{1 \le m \le x-1} \|\beta\| \text{Err}(m, \alpha) \right) + \text{Err}(0, \alpha) + \text{Err}(x, \alpha) \\ &\ll \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) + \left( 1+x\left\| \beta \right\| \right) O\left( \sqrt{qx} (\log qx)^2 \right) \end{aligned}

as desired. $\Box$

Thus if ${\alpha}$ is close to a fraction with small denominator, the value of ${S(x, \alpha)}$ is bounded above. We can now combine this with the Dirichlet approximation theorem to obtain the following general result.

Corollary 15

Suppose ${M = N^{2/3}}$ and suppose ${\left\lvert \alpha - a/q \right\rvert \le \frac{1}{qM}}$ for some ${\gcd(a,q) = 1}$ with ${q \le M}$. Assuming Theorem 9, we have

$\displaystyle S(x, \alpha) \ll \frac{x}{\varphi(q)} + x^{\frac56+\varepsilon}$

for any ${\varepsilon > 0}$.

Proof: Apply Lemma 14 directly. $\Box$

5. Estimation of the arcs

We’ll write

$\displaystyle f(\alpha) \overset{\text{def}}{=} S(N,\alpha)=\sum_{n \le N} \Lambda(n)e(n\alpha)$

for brevity in this section.

Recall that we wish to bound the right-hand side of (2) in Proposition 5. We split ${[0,1]}$ into two sets, which we call the “major arcs” and the “minor arcs.” To do so, we use Dirichlet approximation, as hinted at earlier.

In what follows, fix

\displaystyle \begin{aligned} M &= N^{2/3} \\ K &= (\log N)^{10}. \end{aligned}

5.1. Setting up the arcs

Definition 16

For ${q \le K}$ and ${\gcd(a,q) = 1}$, ${1 \le a \le q}$, we define

$\displaystyle \mathfrak M(a,q) = \left\{ \alpha \in \mathbb T \mid \left\lvert \alpha - \frac aq \right\rvert \le \frac 1M \right\}.$

These will be the major arcs. The union of all major arcs is denoted by ${\mathfrak M}$. The complement is denoted by ${\mathfrak m}$.

Equivalently, for any ${\alpha}$, consider ${q = q(\alpha) \le M}$ as in Theorem 12. Then ${\alpha \in \mathfrak M}$ if ${q \le K}$ and ${\alpha \in \mathfrak m}$ otherwise.

Proposition 17

${\mathfrak M}$ is composed of finitely many disjoint intervals ${\mathfrak M(a,q)}$ with ${q \le K}$. The complement ${\mathfrak m}$ is nonempty.

Proof: Note that if ${q_1, q_2 \le K}$ and ${a/q_1 \neq b/q_2}$ then ${\left\lvert \frac{a}{q_1} - \frac{b}{q_2} \right\rvert \ge \frac{1}{q_1q_2} \gg \frac{3}{qM}}$. $\Box$

In particular both ${\mathfrak M}$ and ${\mathfrak m}$ are measurable. Thus we may split the integral in (2) over ${\mathfrak M}$ and ${\mathfrak m}$. This integral will have large magnitude on the major arcs, and small magnitude on the minor arcs, so overall the whole interval ${[0,1]}$ it will have large magnitude.

5.2. Estimate of the minor arcs

First, we note the well known fact ${\phi(q) \gg q/\log q}$. Note also that if ${q=q(\alpha)}$ as in the last section and ${\alpha}$ is on a minor arc, we have ${q > (\log N)^{10}}$, and thus ${\phi(q) \gg (\log N)^{9}}$.

As such Corollary 3.3 yields that ${f(\alpha) \ll \frac{N}{\phi(q)}+N^{.834} \ll \frac{N}{(\log N)^9}}$.

Now,

\displaystyle \begin{aligned} \left\lvert \int_{\mathfrak m}f(\alpha)^3e(-N\alpha) \; d\alpha \right\rvert &\le \int_{\mathfrak m}\left\lvert f(\alpha)\right\rvert ^3 \; d\alpha \\ &\ll \frac{N}{(\log N)^9} \int_{0}^{1}\left\lvert f(\alpha)\right\rvert ^2 \;d\alpha \\ &=\frac{N}{(\log N)^9}\int_{0}^{1}f(\alpha)f(-\alpha) \; d\alpha \\ &=\frac{N}{(\log N)^9}\sum_{n \le N} \Lambda(n)^2 \\ &\ll \frac{N^2}{(\log N)^8}, \end{aligned}

using the well known bound ${\sum_{n \le N} \Lambda(n)^2 \ll \frac{N}{\log N}}$. This bound of ${\frac{N^2}{(\log N)^8}}$ will be negligible compared to lower bounds for the major arcs in the next section.

5.3. Estimate on the major arcs

We show that

$\displaystyle \int_{\mathfrak M}f(\alpha)^3e(-N\alpha) d\alpha \asymp \frac{N^2}{2} \mathfrak G(N).$

By Proposition 17 we can split the integral over each interval and write

$\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha = \sum_{q \le (\log N)^{10}}\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} \int_{-1/qM}^{1/qM}f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta.$

Then we apply Lemma 14, which gives

\displaystyle \begin{aligned} f(a/q+\beta)^3 &= \left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 \\ &+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^2 O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right) \\ &+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right) O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^2 \\ &+O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^3. \end{aligned}

Now, we can do casework on the side of ${N^{-.9}}$ that ${\|\beta\|}$ lies on.

• If ${\|\beta\| \gg N^{-.9}}$, we have ${\sum_{n \le N}e(\beta n) \ll \frac{2}{|e(\beta)-1|} \ll \frac{1}{\|\beta\|} \ll N^{.9}}$, and ${(1+\|\beta\|N)\sqrt{qN} \log^2 qN \ll N^{5/6+\varepsilon}}$, because certainly we have ${\|\beta\|<1/M=N^{-2/3}}$.
• If on the other hand ${\|\beta\|\ll N^{-.9}}$, we have ${\sum_{n \le N}e(\beta n) \ll N}$ obviously, and ${O(1+\|\beta\|N)\sqrt{qN} \log^2 qN) \ll N^{3/5+\varepsilon}}$.

As such, we obtain

$\displaystyle f(a/q+\beta)^3 \ll \left( \frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 + O\left(N^{79/30+\varepsilon}\right)$

in either case. Thus, we can write

\displaystyle \begin{aligned} &\qquad \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \\ &= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} \int_{-1/qM}^{1/qM} f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta \\ &= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} \int_{-1/qM}^{1/qM}\left[\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^3 + O\left(N^{79/30+\varepsilon}\right)\right]e(-N(a/q+\beta)) \; d\beta \\ &=\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} S_q \left(\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))\right) \left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) \; d\beta \right ) \\ &\qquad +O\left(N^{59/30+\varepsilon}\right). \end{aligned}

just using ${M \le N^{2/3}}$. Now, we use

$\displaystyle \sum_{n \le N}e(\beta n) = \frac{1-e(\beta N)}{1-e(\beta)} \ll \frac{1}{\|\beta\|}.$

This enables us to bound the expression

$\displaystyle \int_{1/qM}^{1-1/qM}\left (\sum_{n \le N}e(\beta n)\right) ^ 3 e(-N\beta)d\beta \ll \int_{1/qM}^{1-1/qM}\|\beta\|^{-3} d\beta = 2\int_{1/qM}^{1/2}\beta^{-3} d\beta \ll q^2M^2.$

But the integral over the entire interval is

\displaystyle \begin{aligned} \int_{0}^{1}\left(\sum_{n \le N}e(\beta n) \right)^3 e(-N\beta)d\beta &= \int_{0}^{1} \sum_{a,b,c \le N} e((a+b+c-N)\beta) \\ &\ll \sum_{a,b,c \le N} \mathbf 1(a+b+c=N) \\ &= \binom{N-1}{2}. \end{aligned}

Considering the difference of the two integrals gives

$\displaystyle \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n) \right)^3 e(-N\beta) \; d\beta - \frac{N^2}{2} \ll q^2 M^2 + N \ll (\log N)^c N^{4/3},$

for some absolute constant ${c}$.

For brevity, let

$\displaystyle S_q = \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q)).$

Then

\displaystyle \begin{aligned} \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha &= \sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3}S_q \left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) \; d\beta \right ) \\ &\qquad +O\left(N^{59/30+\varepsilon}\right) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3}S_q + O((\log N)^{10+c} N^{4/3}) + O(N^{59/30+\varepsilon}) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} + O(N^{59/30+\varepsilon}). \end{aligned}

.

The inner sum is bounded by ${\phi(q)}$. So,

$\displaystyle \left\lvert \sum_{q>(\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} S_q \right\rvert \le \sum_{q>(\log N)^{10}} \frac{1}{\phi(q)^2},$

which converges since ${\phi(q)^2 \gg q^c}$ for some ${c > 1}$. So

$\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha = \frac{N^2}{2}\sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3}S_q + O(N^{59/30+\varepsilon}).$

Now, since ${\mu(q)}$, ${\phi(q)}$, and ${\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))}$ are multiplicative functions of ${q}$, and ${\mu(q)=0}$ unless ${q}$ is squarefree,

\displaystyle \begin{aligned} \sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3} S_q &= \prod_p \left(1+\frac{\mu(p)}{\phi(p)^3}S_p \right) \\ &= \prod_p \left(1-\frac{1}{(p-1)^3} \sum_{a=1}^{p-1} e(-N(a/p))\right) \\ &= \prod_p \left(1-\frac{1}{(p-1)^3}\sum_{a=1}^{p-1} (p\cdot \mathbf 1(p|N) - 1)\right) \\ &= \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right) \prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right) \\ &= \mathfrak G(N). \end{aligned}

So,

$\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha = \frac{N^2}{2}\mathfrak{G}(N) + O(N^{59/30+\varepsilon}).$

When ${N}$ is odd,

$\displaystyle \mathfrak{G}(N) = \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right)\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right)\geq \prod_{m\geq 3}\left(\frac{m-2}{m-1}\frac{m}{m-1}\right)=\frac{1}{2},$

so that we have

$\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \asymp \frac{N^2}{2}\mathfrak{G}(N),$

as desired.

6. Completing the proof

Because the integral over the minor arc is ${o(N^2)}$, it follows that

$\displaystyle \sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) = \int_{0}^{1} f(\alpha)^3 e(-N\alpha) d \alpha \asymp \frac{N^2}{2}\mathfrak{G}(N) \gg N^2.$

Consider the set ${S_N}$ of integers ${p^k\leq N}$ with ${k>1}$. We must have ${p \le N^{\frac{1}{2}}}$, and for each such ${p}$ there are at most ${O(\log N)}$ possible values of ${k}$. As such, ${|S_N| \ll\pi(N^{1/2}) \log N\ll N^{1/2}}$.

Thus

$\displaystyle \sum_{\substack{a+b+c=N \\ a\in S_N}} \Lambda(a)\Lambda(b)\Lambda(c) \ll (\log N)^3 |S|N \ll\log(N)^3 N^{3/2},$

and similarly for ${b\in S_N}$ and ${c\in S_N}$. Notice that summing over ${a\in S_N}$ is equivalent to summing over composite ${a}$, so

$\displaystyle \sum_{p_1+p_2+p_3=N} \Lambda(p_1)\Lambda(p_2)\Lambda(p_3) =\sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) + O(\log(N)^3 N^{3/2}) \gg N^2,$

where the sum is over primes ${p_i}$. This finishes the proof.

Miller-Rabin (for MIT 18.434)

This is a transcript of a talk I gave as part of MIT’s 18.434 class, the “Seminar in Theoretical Computer Science” as part of MIT’s communication requirement. (Insert snarky comment about MIT’s CI-* requirements here.) It probably would have made a nice math circle talk for high schoolers but I felt somewhat awkward having to present it to a bunch of students who were clearly older than me.

1. Preliminaries

1.1. Modular arithmetic

In middle school you might have encountered questions such as

Exercise 1

What is ${3^{2016} \pmod{10}}$?

You could answer such questions by listing out ${3^n}$ for small ${n}$ and then finding a pattern, in this case of period ${4}$. However, for large moduli this “brute-force” approach can be time-consuming.

Fortunately, it turns out that one can predict the period in advance.

Theorem 2 (Euler’s little theorem)

1. Let ${\gcd(a,n) = 1}$. Then ${a^{\phi(n)} \equiv 1 \pmod n}$.
2. (Fermat) If ${p}$ is a prime, then ${a^p \equiv a \pmod p}$ for every ${a}$.

Proof: Part (a) is a special case of Lagrange’s Theorem: if ${G}$ is a finite group and ${g \in G}$, then ${g^{|G|}}$ is the identity element. Now select ${G = (\mathbb Z/n\mathbb Z)^\times}$. Part (b) is the case ${n=p}$. $\Box$

Thus, in the middle school problem we know in advance that ${3^4 \equiv 1 \pmod{10}}$ because ${\phi(10) = 4}$. This bound is sharp for primes:

Theorem 3 (Primitive roots)

For every ${p}$ prime there’s a ${g \pmod p}$ such that ${g^{p-1} \equiv 1 \pmod p}$ but ${g^{k} \not\equiv 1 \pmod p}$ for any ${k < p-1}$. (Hence ${(\mathbb Z/p\mathbb Z)^\times \cong \mathbb Z/(p-1)}$.)

For a proof, see the last exercise of my orders handout.

We will define the following anyways:

Definition 4

We say an integer ${n}$ (thought of as an exponent) annihilates the prime ${p}$ if

• ${a^n \equiv 1 \pmod p}$ for every ${a \not\equiv 0 \pmod p}$,
• or equivalently, ${p-1 \mid n}$.

Theorem 5 (All/nothing)

Suppose an exponent ${n}$ does not annihilate the prime ${p}$. Then more than ${\frac{1}{2} p}$ of ${x \pmod p}$ satisfy ${x^n \not\equiv 1 \pmod p}$.

Proof: Much stronger result is true: in ${x^n \equiv 1 \pmod p}$ then ${x^{\gcd(n,p-1)} \equiv 1 \pmod p}$. $\Box$

1.2. Repeated Exponentiation

Even without the previous facts, one can still do:

Theorem 6 (Repeated exponentation)

Given ${x}$ and ${n}$, one can compute ${x^n \pmod N}$ with ${O(\log n)}$ multiplications mod ${N}$.

The idea is that to compute ${x^{600} \pmod N}$, one just multiplies ${x^{512+64+16+8}}$. All the ${x^{2^k}}$ can be computed in ${k}$ steps, and ${k \le \log_2 n}$.

1.3. Chinese remainder theorem

In the middle school problem, we might have noticed that to compute ${3^{2016} \pmod{10}}$, it suffices to compute it modulo ${5}$, because we already know it is odd. More generally, to understand ${\pmod n}$ it suffices to understand ${n}$ modulo each of its prime powers.

The formal statement, which we include for completeness, is:

Theorem 7 (Chinese remainder theorem)

Let ${p_1}$, ${p_2}$, \dots, ${p_m}$ be distinct primes, and ${e_i \ge 1}$ integers. Then there is a ring isomorphism given by the natural projection

$\displaystyle \mathbb Z/n \rightarrow \prod_{i=1}^m \mathbb Z/p_i^{e_i}.$

In particular, a random choice of ${x \pmod n}$ amounts to a random choice of ${x}$ mod each prime power.

For an example, in the following table we see the natural bijection between ${x \pmod{15}}$ and ${(x \pmod 3, x \pmod 5)}$.

$\displaystyle \begin{array}{c|cc} x \pmod{15} & x \pmod{3} & x \pmod{5} \\ \hline 0 & 0 & 0 \\ 1 & 1 & 1 \\ 2 & 2 & 2 \\ 3 & 0 & 3 \\ 4 & 1 & 4 \\ 5 & 2 & 0 \\ 6 & 0 & 1 \\ 7 & 1 & 2 \end{array} \quad \begin{array}{c|cc} x \pmod{15} & x \pmod{3} & x \pmod{5} \\ \hline 8 & 2 & 3 \\ 9 & 0 & 4 \\ 10 & 1 & 0 \\ 11 & 2 & 1 \\ 12 & 0 & 2 \\ 13 & 1 & 3 \\ 14 & 2 & 4 \\ && \end{array}$

2. The RSA algorithm

This simple number theory is enough to develop the so-called RSA algorithm. Suppose Alice wants to send Bob a message ${M}$ over an insecure channel. They can do so as follows.

• Bob selects integers ${d}$, ${e}$ and ${N}$ (with ${N}$ huge) such that ${N}$ is a semiprime and

$\displaystyle de \equiv 1 \pmod{\phi(N)}.$

• Bob publishes both the number ${N}$ and ${e}$ (the public key) but keeps the number ${d}$ secret (the private key).
• Alice sends the number ${X = M^e \pmod N}$ across the channel.
• Bob computes

$\displaystyle X^d \equiv M^{de} \equiv M^1 \equiv M \pmod N$

and hence obtains the message ${M}$.

In practice, the ${N}$ in RSA is at least ${2000}$ bits long.

The trick is that an adversary cannot compute ${d}$ from ${e}$ and ${N}$ without knowing the prime factorization of ${N}$. So the security relies heavily on the difficulty of factoring.

Remark 8

It turns out that we basically don’t know how to factor large numbers ${N}$: the best known classical algorithms can factor an ${n}$-bit number in

$\displaystyle O\left( \exp\left( \frac{64}{9} n \log(n)^2 \right)^{1/3} \right)$

time (“general number field sieve”). On the other hand, with a quantum computer one can do this in ${O\left( n^2 \log n \log \log n \right)}$ time.

3. Primality Testing

Main question: if we can’t factor a number ${n}$ quickly, can we at least check it’s prime?

In what follows, we assume for simplicity that ${n}$ is squarefree, i.e. ${n = p_1 p_2 \dots p_k}$ for distinct primes ${p_k}$, This doesn’t substantially change anything, but it makes my life much easier.

3.1. Co-RP

Here is the goal: we need to show there is a random algorithm ${A}$ which does the following.

• If ${n}$ is composite then
• More than half the time ${A}$ says “definitely composite”.
• Occasionally, ${A}$ says “possibly prime”.
• If ${n}$ is prime, ${A}$ always says “possibly prime”.

If there is a polynomial time algorithm ${A}$ that does this, we say that PRIMES is in Co-RP. Clearly, this is a very good thing to be true!

3.2. Fermat

One idea is to try to use the converse of Fermat’s little theorem: given an integer ${n}$, pick a random number ${x \pmod n}$ and see if ${x^{n-1} \equiv 1 \pmod n}$. (We compute using repeated exponentiation.) If not, then we know for sure ${n}$ is not prime, and we call ${x}$ a Fermat witness modulo ${n}$.

How good is this test? For most composite ${n}$, pretty good:

Proposition 9

Let ${n}$ be composite. Assume that there is a prime ${p \mid n}$ such that ${n-1}$ does not annihilate ${p}$. Then over half the numbers mod ${n}$ are Fermat witnesses.

Proof: Apply the Chinese theorem then the “all-or-nothing” theorem. $\Box$
Unfortunately, if ${n}$ doesn’t satisfy the hypothesis, then all the ${\gcd(x,n) = 1}$ satisfy ${x^{n-1} \equiv 1 \pmod n}$!

Are there such ${n}$ which aren’t prime? Such numbers are called Carmichael numbers, but unfortunately they exist, the first one is ${561 = 3 \cdot 11 \cdot 17}$.

Remark 10

For ${X \gg 1}$, there are more than ${X^{1/3}}$ Carmichael numbers at most ${X}$.

Thus these numbers are very rare, but they foil the Fermat test.

Exercise 11

Show that a Carmichael number is not a semiprime.

3.3. Rabin-Miller

Fortunately, we can adapt the Fermat test to cover Carmichael numbers too. It comes from the observation that if ${n}$ is prime, then ${a^2 \equiv 1 \pmod n \implies a \equiv \pm 1 \pmod n}$.

So let ${n-1 = 2^s t}$, where ${t}$ is odd. For example, if ${n = 561}$ then ${560 = 2^4 \cdot 35}$. Then we compute ${x^t}$, ${x^{2t}}$, \dots, ${x^{n-1}}$. For example in the case ${n=561}$ and ${x=245}$:

$\displaystyle \begin{array}{c|r|rrr} & \mod 561 & \mod 3 & \mod 11 & \mod 17 \\ \hline x & 245 & -1 & 3 & 7 \\ \hline x^{35} & 122 & -1 & \mathbf 1 & 3 \\ x^{70} & 298 & \mathbf 1 & 1 & 9 \\ x^{140} & 166 & 1 & 1 & -4 \\ x^{280} & 67 & 1 & 1 & -1 \\ x^{560} & 1 & 1 & 1 & \mathbf 1 \end{array}$

And there we have our example! We have ${67^2 \equiv 1 \pmod{561}}$, so ${561}$ isn’t prime.

So the Rabin-Miller test works as follows:

• Given ${n}$, select a random ${x}$ and compute powers of ${x}$ as in the table.
• If ${x^{n-1} \not\equiv 1}$, stop, ${n}$ is composite (Fermat test).
• If ${x^{n-1} \equiv 1}$, see if the entry just before the first ${1}$ is ${-1}$. If it isn’t then we say ${x}$ is a RM-witness and ${n}$ is composite.
• Otherwise, ${n}$ is “possibly prime”.

How likely is probably?

Theorem 12

If ${n}$ is Carmichael, then over half the ${x \pmod n}$ are RM witnesses.

Proof: We sample ${x \pmod n}$ randomly again by looking modulo each prime (Chinese theorem). By the theorem on primitive roots, show that the probability the first ${-1}$ appears in any given row is ${\le \frac{1}{2}}$. This implies the conclusion. $\Box$

Exercise 13

Improve the ${\frac{1}{2}}$ in the problem to ${\frac34}$ by using the fact that Carmichael numbers aren’t semiprime.

3.4. AKS

In August 6, 2002, it was in fact shown that PRIMES is in P, using the deterministic AKS algorithm. However, in practice everyone still uses Miller-Rabin since the implied constants for AKS runtime are large.

Artin Reciprocity

I will tell you a story about the Reciprocity Law. After my thesis, I had the idea to define ${L}$-series for non-abelian extensions. But for them to agree with the ${L}$-series for abelian extensions, a certain isomorphism had to be true. I could show it implied all the standard reciprocity laws. So I called it the General Reciprocity Law and tried to prove it but couldn’t, even after many tries. Then I showed it to the other number theorists, but they all laughed at it, and I remember Hasse in particular telling me it couldn’t possibly be true.

Still, I kept at it, but nothing I tried worked. Not a week went by — for three years! — that I did not try to prove the Reciprocity Law. It was discouraging, and meanwhile I turned to other things. Then one afternoon I had nothing special to do, so I said, `Well, I try to prove the Reciprocity Law again.’ So I went out and sat down in the garden. You see, from the very beginning I had the idea to use the cyclotomic fields, but they never worked, and now I suddenly saw that all this time I had been using them in the wrong way — and in half an hour I had it.

— Emil Artin

Algebraic number theory assumed (e.g. the ANT chapters of Napkin). In this post, I’m going to state some big theorems of global class field theory and use them to deduce the Kronecker-Weber plus Hilbert class fields. For experts: this is global class field theory, without ideles.

Here’s the executive summary: let ${K}$ be a number field. Then all abelian extensions ${L/K}$ can be understood using solely information intrinsic to ${K}$: namely, the ray class groups (generalizing ideal class groups).

1. Infinite primes

Let ${K}$ be a number field of degree ${n}$. We know what a prime ideal of ${\mathcal O_K}$ is, but we now allow for the so-called infinite primes, which I’ll describe using embeddings. We know there are ${n}$ embeddings ${\sigma : K \rightarrow \mathbb C}$, which consist of

• ${r}$ real embeddings where ${\mathop{\mathrm{im}}\sigma \subseteq \mathbb R}$, and
• ${s}$ pairs of conjugate complex embeddings.

Hence ${r+2s = n}$. The first class of embeddings form the real infinite primes, while the complex infinite primes are the second type. We say ${K}$ is totally real (resp totally complex) if all its infinite primes are real (resp complex).

Example 1 (Examples of infinite primes)

• ${\mathbb Q}$ has a single real infinite prime. We often write it as ${\infty}$.
• ${\mathbb Q(\sqrt{-5})}$ has a single complex infinite prime, and no real infinite primes. Hence totally complex.
• ${\mathbb Q(\sqrt{5})}$ has two real infinite primes, and no complex infinite primes. Hence totally real.

The motivation from this actually comes from the theory of valuations. Every prime corresponds exactly to a valuation; the infinite primes correspond to the Archimedean valuations while the finite primes correspond to the non-Archimedean valuations.

2. Modular arithmetic with infinite primes

A modulus is a formal product

$\displaystyle \mathfrak m = \prod_{\mathfrak p} \mathfrak p^{\nu(\mathfrak p)}$

where the product runs over all primes, finite and infinite. (Here ${\nu(\mathfrak p)}$ is a nonnegative integer, of which only finitely many are nonzero.) We also require that

• ${\nu(\mathfrak p) = 0}$ for any infinite prime ${\mathfrak p}$, and
• ${\nu(\mathfrak p) \le 1}$ for any real prime ${\mathfrak p}$.

Obviously, every ${\mathfrak m}$ can be written as ${\mathfrak m = \mathfrak m_0\mathfrak m_\infty}$ by separating the finite from the (real) infinite primes.

We say ${a \equiv b \pmod{\mathfrak m}}$ if

• If ${\mathfrak p}$ is a finite prime, then ${a \equiv b \pmod{\mathfrak p^{\nu(\mathfrak p)}}}$ means exactly what you think it should mean: ${a-b \in \mathfrak p^{\nu(\mathfrak p)}}$.
• If ${\mathfrak p}$ is a real infinite prime ${\sigma : K \rightarrow \mathbb R}$, then ${a \equiv b \pmod{\mathfrak p}}$ means that ${\sigma(a/b) > 0}$.

Of course, ${a \equiv b \pmod{\mathfrak m}}$ means ${a \equiv b}$ modulo each prime power in ${\mathfrak m}$. With this, we can define a generalization of the class group:

Definition 2

Let ${\mathfrak m}$ be a modulus of a number field ${K}$.

• Let ${I_K(\mathfrak m)}$ to be the set of all fractional ideals of ${K}$ which are relatively prime to ${\mathfrak m}$, which is an abelian group.
• Let ${P_K(\mathfrak m)}$ be the subgroup of ${I_K(\mathfrak m)}$ generated by

$\displaystyle \left\{ \alpha \mathcal O_K \mid \alpha \in K^\times \text{ and } \alpha \equiv 1 \pmod{\mathfrak m} \right\}.$

This is sometimes called a “ray” of principal ideals.

Finally define the ray class group of ${\mathfrak m}$ to be ${C_K(\mathfrak m) = I_K(\mathfrak m) / P_K(\mathfrak m)}$.

This group is known to always be finite. Note the usual class group is ${C_K(1)}$.

One last definition that we’ll use right after Artin reciprocity:

Definition 3

A congruence subgroup of ${\mathfrak m}$ is a subgroup ${H}$ with

$\displaystyle P_K(\mathfrak m) \subseteq H \subseteq I_K(\mathfrak m).$

Thus ${C_K(\mathfrak m)}$ is a group which contains a lattice of various quotients ${I_K(\mathfrak m) / H}$, where ${H}$ is a congruence subgroup.

This definition takes a while to get used to, so here are examples.

Example 4 (Ray class groups in ${\mathbb Q}$)

Consider ${K = \mathbb Q}$ with infinite prime ${\infty}$. Then

• If we take ${\mathfrak m = 1}$ then ${I_\mathbb Q(1)}$ is all fractional ideals, and ${P_\mathbb Q(1)}$ is all principal fractional ideals. Their quotient is the usual class group of ${\mathbb Q}$, which is trivial.

• Now take ${\mathfrak m = 8}$. Thus ${I_\mathbb Q(8) \cong \left\{ \frac ab\mathbb Z \mid a/b \equiv 1,3,5,7 \pmod 8 \right\}}$. Moreover

$\displaystyle P_\mathbb Q(8) \cong \left\{ \frac ab\mathbb Z \mid a/b \equiv 1 \pmod 8 \right\}.$

You might at first glance think that the quotient is thus ${(\mathbb Z/8\mathbb Z)^\times}$. But the issue is that we are dealing with ideals: specifically, we have

$\displaystyle 7\mathbb Z = -7\mathbb Z \in P_\mathbb Q(8)$

because ${-7 \equiv 1 \pmod 8}$. So actually, we get

$\displaystyle C_\mathbb Q(8) \cong \left\{ 1,3,5,7 \text{ mod } 8 \right\} / \left\{ 1,7 \text{ mod } 8 \right\} \cong (\mathbb Z/4\mathbb Z)^\times.$

• Now take ${\mathfrak m = \infty}$. As before ${I_\mathbb Q(\infty) = \mathbb Q^\times}$. Now, by definition we have

$\displaystyle P_\mathbb Q(\infty) = \left\{ \frac ab \mathbb Z \mid a/b > 0 \right\}.$

At first glance you might think this was ${\mathbb Q_{>0}}$, but the same behavior with ideals shows in fact ${P_\mathbb Q(\infty) = \mathbb Q^\times}$. So in this case, ${P_\mathbb Q(\infty)}$ still has all principal fractional ideals. Therefore, ${C_\mathbb Q(\infty)}$ is still trivial.

• Finally, let ${\mathfrak m = 8\infty}$. As before ${I_\mathbb Q(8\infty) \cong \left\{ \frac ab\mathbb Z \mid a/b \equiv 1,3,5,7 \pmod 8 \right\}}$. Now in this case:

$\displaystyle P_\mathbb Q(8\infty) \cong \left\{ \frac ab\mathbb Z \mid a/b \equiv 1 \pmod 8 \text{ and } a/b > 0 \right\}.$

This time, we really do have ${-7\mathbb Z \notin P_\mathbb Q(8\infty)}$: we have ${7 \not\equiv 1 \pmod 8}$ and also ${-8 < 0}$. So neither of the generators of ${7\mathbb Z}$ are in ${P_\mathbb Q(8\infty)}$. Thus we finally obtain

$\displaystyle C_\mathbb Q(8\infty) \cong \left\{ 1,3,5,7 \text{ mod } 8 \right\} / \left\{ 1 \text{ mod } 8 \right\} \cong (\mathbb Z/8\mathbb Z)^\times$

with the bijection ${C_\mathbb Q(8\infty) \rightarrow (\mathbb Z/8\mathbb Z)^\times}$ given by ${a\mathbb Z \mapsto |a| \pmod 8}$.

Generalizing these examples, we see that

\displaystyle \begin{aligned} C_\mathbb Q(m) &= (\mathbb Z/m\mathbb Z)^\times / \{\pm1\} \\ C_\mathbb Q(m\infty) &= (\mathbb Z/m\mathbb Z)^\times. \end{aligned}

3. Infinite primes in extensions

I want to emphasize that everything above is intrinsic to a particular number field ${K}$. After this point we are going to consider extensions ${L/K}$ but it is important to keep in mind the distinction that the concept of modulus and ray class group are objects defined solely from ${K}$ rather than the above ${L}$.

Now take a Galois extension ${L/K}$ of degree ${m}$. We already know prime ideals ${\mathfrak p}$ of ${K}$ break into a produt of prime ideals ${\mathfrak P}$ of ${K}$ in ${L}$ in a nice way, so we want to do the same thing with infinite primes. This is straightforward: each of the ${n}$ infinite primes ${\sigma : K \rightarrow \mathbb C}$ lifts to ${m}$ infinite primes ${\tau : L \rightarrow \mathbb C}$, by which I mean the diagram

commutes. Hence like before, each infinite prime ${\sigma}$ of ${K}$ has ${m}$ infinite primes ${\tau}$ of ${L}$ which lie above it.

For a real prime ${\sigma}$ of ${K}$, any of the resulting ${\tau}$ above it are complex, we say that the prime ${\sigma}$ ramifies in the extension ${L/K}$. Otherwise it is unramified in ${L/K}$. An infinite prime of ${K}$ is always unramified in ${L/K}$. In this way, we can talk about an unramified Galois extension ${L/K}$: it is one where all primes (finite or infinite) are unramified.

Example 5 (Ramification of ${\infty}$)

Let ${\infty}$ be the real infinite prime of ${\mathbb Q}$.

• ${\infty}$ is ramified in ${\mathbb Q(\sqrt{-5})/\mathbb Q}$.
• ${\infty}$ is unramified in ${\mathbb Q(\sqrt{5})/\mathbb Q}$.

Note also that if ${K}$ is totally complex then any extension ${L/K}$ is unramified.

4. Frobenius element and Artin symbol

Recall the following key result:

Theorem 6 (Frobenius element)

Let ${L/K}$ be a Galois extension. If ${\mathfrak p}$ is a prime unramified in ${K}$, and ${\mathfrak P}$ a prime above it in ${L}$. Then there is a unique element of ${\mathop{\mathrm{Gal}}(L/K)}$, denoted ${\mathrm{Frob}_\mathfrak P}$, obeying

$\displaystyle \mathrm{Frob}_\mathfrak P(\alpha) \equiv \alpha^{N\mathfrak p} \pmod{\mathfrak P} \qquad \forall \alpha \in \mathcal O_L.$

Example 7 (Example of Frobenius elements)

Let ${L = \mathbb Q(i)}$, ${K = \mathbb Q}$. We have ${\mathop{\mathrm{Gal}}(L/K) \cong \mathbb Z/2\mathbb Z}$.

If ${p}$ is an odd prime with ${\mathfrak P}$ above it, then ${\mathrm{Frob}_\mathfrak P}$ is the unique element such that

$\displaystyle (a+bi)^p \equiv \mathrm{Frob}_\mathfrak P(a+bi) \pmod{\mathfrak P}$

in ${\mathbb Z[i]}$. In particular,

$\displaystyle \mathrm{Frob}_\mathfrak P(i) = i^p = \begin{cases} i & p \equiv 1 \pmod 4 \\ -i & p \equiv 3 \pmod 4. \end{cases}$

From this we see that ${\mathrm{Frob}_\mathfrak P}$ is the identity when ${p \equiv 1 \pmod 4}$ and ${\mathrm{Frob}_\mathfrak P}$ is complex conjugation when ${p \equiv 3 \pmod 4}$.

Example 8 (Cyclotomic Frobenius element)

Generalizing previous example, let ${L = \mathbb Q(\zeta)}$ and ${K = \mathbb Q}$, with ${\zeta}$ an ${m}$th root of unity. It’s well-known that ${L/K}$ is unramified outside ${\infty}$ and prime factors of ${m}$. Moreover, the Galois group ${\mathop{\mathrm{Gal}}(L/K)}$ is ${(\mathbb Z/m\mathbb Z)^\times}$: the Galois group consists of elements of the form

$\displaystyle \sigma_n : \zeta \mapsto \zeta^n$

and ${\mathop{\mathrm{Gal}}(L/K) = \left\{ \sigma_n \mid n \in (\mathbb Z/m\mathbb Z)^\times \right\}}$.

Then it follows just like before that if ${p \nmid n}$ is prime and ${\mathfrak P}$ is above ${p}$

$\displaystyle \mathrm{Frob}_\mathfrak P(x) = \sigma_p.$

An important property of the Frobenius element is its order is related to the decomposition of ${\mathfrak p}$ in the higher field ${L}$ in the nicest way possible:

Lemma 9 (Order of the Frobenius element)

The Frobenius element ${\mathrm{Frob}_\mathfrak P \in \mathop{\mathrm{Gal}}(L/K)}$ of an extension ${L/K}$ has order equal to the inertial degree of ${\mathfrak P}$, that is,

$\displaystyle \mathop{\mathrm{ord}} \mathrm{Frob}_\mathfrak P = f(\mathfrak P \mid \mathfrak p).$

In particular, ${\mathrm{Frob}_\mathfrak P = \mathrm{id}}$ if and only if ${\mathfrak p}$ splits completely in ${L/K}$.

Proof: We want to understand the order of the map ${T : x \mapsto x^{N\mathfrak p}}$ on the field ${\mathcal O_K / \mathfrak P}$. But the latter is isomorphic to the splitting field of ${X^{N\mathfrak P} - X}$ in ${\mathbb F_p}$, by Galois theory of finite fields. Hence the order is ${\log_{N\mathfrak p} (N\mathfrak P) = f(\mathfrak P \mid \mathfrak p)}$. $\Box$

Exercise 10

Deduce from this that the rational primes which split completely in ${\mathbb Q(\zeta)}$ are exactly those with ${p \equiv 1 \pmod m}$. Here ${\zeta}$ is an ${m}$th root of unity.

The Galois group acts transitively among the set of ${\mathfrak P}$ above a given ${\mathfrak p}$, so that we have

$\displaystyle \mathrm{Frob}_{\sigma(\mathfrak P)} = \sigma \circ (\mathrm{Frob}_{\mathfrak p}) \circ \sigma^{-1}.$

Thus ${\mathrm{Frob}_\mathfrak P}$ is determined by its underlying ${\mathfrak p}$ up to conjugation.

In class field theory, we are interested in abelian extensions, (which just means that ${\mathop{\mathrm{Gal}}(L/K)}$ is Galois) in which case the theory becomes extra nice: the conjugacy classes have size one.

Definition 11

Assume ${L/K}$ is an abelian extension. Then for a given unramified prime ${\mathfrak p}$ in ${K}$, the element ${\mathrm{Frob}_\mathfrak P}$ doesn’t depend on the choice of ${\mathfrak P}$. We denote the resulting ${\mathrm{Frob}_\mathfrak P}$ by the Artin symbol.

$\displaystyle \left( \frac{L/K}{\mathfrak p} \right).$

The definition of the Artin symbol is written deliberately to look like the Legendre symbol. To see why:

Example 12 (Legendre symbol subsumed by Artin symbol)

Suppose we want to understand ${(2/p) \equiv 2^{\frac{p-1}{2}}}$ where ${p > 2}$ is prime. Consider the element

$\displaystyle \left( \frac{\mathbb Q(\sqrt 2)/\mathbb Q}{p\mathbb Z} \right) \in \mathop{\mathrm{Gal}}(\mathbb Q(\sqrt 2) / \mathbb Q).$

It is uniquely determined by where it sends ${a}$. But in fact we have

$\displaystyle \left( \frac{\mathbb Q(\sqrt 2)/\mathbb Q}{p\mathbb Z} \right) \left( \sqrt 2 \right) \equiv \left( \sqrt 2 \right)^{p} \equiv 2^{\frac{p-1}{2}} \cdot \sqrt 2 \equiv \left( \frac 2p \right) \sqrt 2 \pmod{\mathfrak P}$

where ${\left( \frac 2p \right)}$ is the usual Legendre symbol, and ${\mathfrak P}$ is above ${p}$ in ${\mathbb Q(\sqrt 2)}$. Thus the Artin symbol generalizes the quadratic Legendre symbol.

Example 13 (Cubic Legendre symbol subsumed by Artin symbol)

Similarly, it also generalizes the cubic Legendre symbol. To see this, assume ${\theta}$ is primary in ${K = \mathbb Q(\sqrt{-3}) = \mathbb Q(\omega)}$ (thus ${\mathcal O_K = \mathbb Z[\omega]}$ is Eisenstein integers). Then for example

$\displaystyle \left( \frac{K(\sqrt[3]{2})/K}{\theta \mathcal O_K} \right) \left( \sqrt[3]{2} \right) \equiv \left( \sqrt[3]{2} \right)^{N(\theta)} \equiv 2^{\frac{N\theta-1}{3}} \cdot \sqrt 2 \equiv \left( \frac{2}{\theta} \right)_3 \sqrt[3]{2}. \pmod{\mathfrak P}$

where ${\mathfrak P}$ is above ${p}$ in ${K(\sqrt[3]{2})}$.

5. Artin reciprocity

Now, we further capitalize on the fact that ${\mathop{\mathrm{Gal}}(L/K)}$ is abelian. For brevity, in what follows let ${\mathop{\mathrm{Ram}}(L/K)}$ denote the primes of ${K}$ (either finite or infinite) which ramify in ${L}$.

Definition 14

Let ${L/K}$ be an abelian extension and let ${\mathfrak m}$ be divisible by every prime in ${\mathop{\mathrm{Ram}}(L/K)}$. Then since ${L/K}$ is abelian we can extend the Artin symbol multiplicatively to a map

$\displaystyle \left( \frac{L/K}{\bullet} \right) : I_K(\mathfrak m) \twoheadrightarrow \mathop{\mathrm{Gal}}(L/K).$

This is called the Artin map, and it is surjective (for example by Chebotarev Density).

Since the map above is surjective, we denote its kernel by ${H_{L/K}(\mathfrak m) \subseteq I_K(\mathfrak m)}$. Thus we have

$\displaystyle \mathop{\mathrm{Gal}}(L/K) \cong I_K(\mathfrak m) / H(\mathfrak m).$

We can now present the long-awaited Artin reciprocity theorem.

Theorem 15 (Artin reciprocity)

Let ${L/K}$ be an abelian extension. Then there is a modulus ${\mathfrak f = \mathfrak f(L/K)}$, divisible by exactly the primes of ${\mathop{\mathrm{Ram}}(L/K)}$, with the following property: if ${\mathfrak m}$ is divisible by all primes of ${\mathop{\mathrm{Ram}}(L/K)}$

$\displaystyle P_K(\mathfrak m) \subseteq H_{L/K}(\mathfrak m) \subseteq I_K(\mathfrak m) \quad\text{if and only if}\quad \mathfrak f \mid \mathfrak m$

We call ${\mathfrak f}$ the conductor of ${L/K}$.

So the conductor ${\mathfrak f}$ plays a similar role to the discriminant (divisible by exactly the primes which ramify), and when ${\mathfrak m}$ is divisible by the conductor, ${H(L/K, \mathfrak m)}$ is a congruence subgroup.

Note that for example, if we pick ${L}$ such that ${H(L/K, \mathfrak m) \cong P_K(\mathfrak m)}$ then Artin reciprocity means that there is an isomorphism

$\displaystyle \mathop{\mathrm{Gal}}(L/K) \cong I_K(\mathfrak m) / P_K(\mathfrak m) \cong C_K(\mathfrak m).$

More generally, we see that ${\mathop{\mathrm{Gal}}(L/K)}$ is always a subgroup some subgroup ${C_K(\mathfrak f)}$.

Example 16 (Cyclotomic field)

Let ${\zeta}$ be a primitive ${m}$th root of unity. We show in this example that

$\displaystyle H_{\mathbb Q(\zeta) / \mathbb Q} (m\infty) = P_\mathbb Q(m\infty) = \left\{ \frac ab \mathbb Z \mid a/b \equiv 1 \pmod m \right\}.$

This is the generic example of achieving the lower bound in Artin reciprocity. It also implies that ${\mathfrak f(\mathbb Q(\zeta)/\mathbb Q) \mid m\infty}$.

It’s well-known ${\mathbb Q(\zeta)/\mathbb Q}$ is unramified outside finite primes dividing ${m}$, so that the Artin symbol is defined on ${I_K(\mathfrak m)}$. Now the Artin map is given by

So we see that the kernel of this map is trivial, i.e.\ it is given by the identity of the Galois group, corresponding to $1 \pmod{m}$Then ${H = H_{L/K}(\mathfrak m)}$ for a unique abelian extension ${L/K}$.

Finally, such subgroups reverse inclusion in the best way possible:

Lemma 17 (Inclusion-reversing congruence subgroups)

Fix a modulus ${\mathfrak m}$. Let ${L/K}$ and ${M/K}$ be abelian extensions and suppose ${\mathfrak m}$ is divisible by the conductors of ${L/K}$ and ${M/K}$. Then

$\displaystyle L \subseteq M \quad\text{if and only if}\quad H_{M/K}(\mathfrak m) \subseteq H_{L/K}(\mathfrak m)$

Here by ${L \subseteq M}$ we mean that ${L}$ is isomorphic to some subfield of ${M}$. Proof: Let us first prove the equivalence with ${\mathfrak m}$ fixed. In one direction, assume ${L \subseteq M}$; one can check from the definitions that the diagram

commutes, because it suffices to verify this for prime powers, which is just saying that Frobenius elements behave well with respect to restriction. Then the inclusion of kernels follows directly. The reverse direction is essentially the Takagi existence theorem. $\Box$
Note that we can always take ${\mathfrak m}$ to be the product of conductors here.

6. Consequences

With all this theory we can deduce the following two results.

Corollary 18 (Kronecker-Weber theorem)

Let ${L}$ be an abelian extension of ${\mathbb Q}$. Then ${L}$ is contained in a cyclic extension ${\mathbb Q(\zeta)}$ where ${\zeta}$ is an ${m}$th root of unity (for some ${m}$).

Proof: Suppose ${\mathfrak f(L/\mathbb Q) \mid m\infty}$ for some ${m}$. Then by the example from earlier we have the chain

$\displaystyle P_{\mathbb Q}(m\infty) = H(\mathbb Q(\zeta)/\mathbb Q, m\infty) \subseteq H(L/\mathbb Q, m) \subseteq I_\mathbb Q(m\infty).$

So by inclusion reversal we’re done. $\Box$

Corollary 19 (Hilbert class field)

Let ${K}$ be any number field. Then there exists a unique abelian extension ${E/K}$ which is unramified at all primes (finite or infinite) and such that

• ${E/K}$ is the maximal such extension by inclusion.
• ${\mathop{\mathrm{Gal}}(E/K)}$ is isomorphic to the class group of ${E}$.
• A prime ${\mathfrak p}$ of ${K}$ splits completely in ${E}$ if and only if it is principal.

We call ${E}$ the Hilbert class field of ${K}$.

Proof: Apply the Takagi existence theorem with ${\mathfrak m = 1}$ to obtain an unramified extension ${E/K}$ such that ${H(E/K, 1) = P_K(1)}$. We claim this works:

• To see it is maximal by inclusion, note that any other extension ${M/K}$ with this property has conductor ${1}$ (no primes divide the conductor), and then we have ${P_K(1) = H(E/K, 1) \subseteq H(M/K, 1) \subseteq I_K(1)}$, so inclusion reversal gives ${M \subseteq E}$.
• We have ${\mathop{\mathrm{Gal}}(L/K) \cong I_K(1) / P_K(1) = C_K(1)}$ the class group.
• The isomorphism in the previous part is given by the Artin symbol. So ${\mathfrak p}$ splits completely if and only if ${\left( \frac{L/K}{\mathfrak p} \right) = \mathrm{id}}$ if and only if ${\mathfrak p}$ is principal (trivial in ${C_K(1)}$).

This completes the proof. $\Box$

One can also derive quadratic and cubic reciprocity from Artin reciprocity; see this link for QR and this link for CR.

Some Notes on Valuations

There are some notes on valuations from the first lecture of Math 223a at Harvard.

1. Valuations

Let ${k}$ be a field.

Definition 1

A valuation

$\displaystyle \left\lvert - \right\rvert : k \rightarrow \mathbb R_{\ge 0}$

is a function obeying the axioms

• ${\left\lvert \alpha \right\rvert = 0 \iff \alpha = 0}$.
• ${\left\lvert \alpha\beta \right\rvert = \left\lvert \alpha \right\rvert \left\lvert \beta \right\rvert}$.
• Most importantly: there should exist a real constant ${C}$, such that ${\left\lvert 1+\alpha \right\rvert < C}$ whenever ${\left\lvert \alpha \right\rvert \le 1}$.

The third property is the interesting one. Note in particular it can be rewritten as ${\left\lvert a+b \right\rvert < C\max\{ \left\lvert a \right\rvert, \left\lvert b \right\rvert \}}$.

Note that we can recover ${\left\lvert 1 \right\rvert = \left\lvert 1 \right\rvert \left\lvert 1 \right\rvert \implies \left\lvert 1 \right\rvert = 1}$ immediately.

Example 2 (Examples of Valuations)

If ${k = \mathbb Q}$, we can take the standard absolute value. (Take ${C=2}$.)

Similarly, the usual ${p}$-adic evaluation, ${\nu_p}$, which sends ${p^a t}$ to ${p^{-a}}$. Here ${C = 1}$ is a valid constant.

These are the two examples one should always keep in mind: with number fields, all valuations look like one of these too. In fact, over ${\mathbb Q}$ it turns out that every valuation “is” one of these two valuations (for a suitable definition of equality). To make this precise:

Definition 3

We say ${\left\lvert - \right\rvert_1 \sim \left\lvert - \right\rvert_2}$ (i.e. two valuations on a field ${k}$ are equivalent) if there exists a constant ${k > 0}$ so that ${\left\lvert \alpha \right\rvert_1 = \left\lvert \alpha \right\rvert_2^k}$ for every ${\alpha \in k}$.

In particular, for any valuation we can force ${C = 2}$ to hold by taking an equivalent valuation to a sufficient power.

In that case, we obtain the following:

Lemma 4

In a valuation with ${C = 2}$, the triangle inequality holds.

Proof: First, observe that we can get

$\displaystyle \left\lvert \alpha + \beta \right\rvert \le 2 \max \left\{ \left\lvert \alpha \right\rvert, \left\lvert \beta \right\rvert \right\}.$

Applying this inductively, we obtain

$\displaystyle \left\lvert \sum_{i=1}^{2^r} a_i \right\rvert \le 2^r \max_i \left\lvert a_i \right\rvert$

$\displaystyle \sum_{i=1}^{n} a_i \le 2n\max_i \left\lvert a_i \right\rvert.$

From this, one can obtain

$\displaystyle \left\lvert \alpha+\beta \right\rvert^n \le \left\lvert \sum_{j=0}^n \binom nj \alpha^j \beta^{n-j} \right\rvert \le 2(n+1) \sum_{j=0}^n \left\lvert \binom nj \right\rvert \left\lvert \alpha \right\rvert^j \left\lvert \beta \right\rvert^{n-j} \le 4(n+1)\left( \left\lvert \alpha \right\rvert+\left\lvert \beta \right\rvert \right)^n.$

Letting ${n \rightarrow \infty}$ completes the proof. $\Box$

Next, we prove that

Lemma 5

If ${\omega^n=1}$ for some ${n}$, then${\left\lvert \omega \right\rvert = 1}$. In particular, on any finite field the only valuation is the trivial one which sends ${0}$ to ${0}$ and all elements to ${1}$.

Proof: Immediate, since ${\left\lvert \omega \right\rvert^n = 1}$. $\Box$

2. Topological field induced by valuations

Let ${k}$ be a field. Given a valuation on it, we can define a basis of open sets

$\displaystyle \left\{ \alpha \mid \left\lvert \alpha - a \right\rvert < d \right\}$

across all ${a \in K}$, ${d \in \mathbb R_{> 0}}$. One can check that the same valuation gives rise to the same topological spaces, so it is fine to assume ${C = 2}$ as discussed earlier; thus, in fact we can make ${k}$ into a metric space, with the valuation as the metric.

In what follows, we’ll always assume our valuation satisfies the triangle inequality. Then:

Lemma 6

Let ${k}$ be a field with a valuation. Viewing ${k}$ as a metric space, it is in fact a topological field, meaning addition and multiplication are continuous.

Proof: Trivial; let’s just check that multiplication is continuous. Observe that

\displaystyle \begin{aligned} \left\lvert (a+\varepsilon_1)(b+\varepsilon_2) - ab \right\rvert & \le \left\lvert \varepsilon_1\varepsilon_2 \right\rvert + \left\lvert a\varepsilon_2 \right\rvert + \left\lvert b\varepsilon_1 \right\rvert \\ &\rightarrow 0. \end{aligned}

$\Box$

Now, earlier we saw that two valuations which are equivalent induce the same topology. We now prove the following converse:

Proposition 7

If two valuations ${\left\lvert - \right\rvert_1}$ and ${\left\lvert - \right\rvert_2}$ give the same topology, then they are in fact equivalent.

Proof: Again, we may safely assume that both satisfy the triangle inequality. Next, observe that ${\left\lvert a \right\rvert < 1 \iff a^n \rightarrow 0}$ (according to the metric) and by taking reciprocals, ${\left\lvert a \right\rvert > 1 \iff a^{-n} \rightarrow 0}$.

Thus, given any ${\beta}$, ${\gamma}$ and integers ${m}$, ${n}$ we derive that

$\displaystyle \left\lvert \beta^n\gamma^m \right\rvert_1 < 1 \iff \left\lvert \beta^n\gamma^m \right\rvert < 1$

with similar statements holding with “${<}$” replaced by “${=}$”, “${>}$”. Taking logs, we derive that

$\displaystyle n \log\left\lvert \beta \right\rvert_1 + m \log \left\lvert \gamma \right\rvert_1 < 0 \iff n \log\left\lvert \beta \right\rvert_2 + m \log \left\lvert \gamma \right\rvert_1 < 0$

and the analogous statements for “${=}$”, “${>}$”. Now just choose an appropriate sequence of ${m}$, ${n}$ and we can deduce that

$\displaystyle \frac{\log \left\lvert \beta_1 \right\rvert}{\log \left\lvert \beta_2 \right\rvert} = \frac{\log \left\lvert \gamma_1 \right\rvert}{\log \left\lvert \gamma_2 \right\rvert}$

so it equals a fixed constant ${c}$ as desired. $\Box$

3. Discrete Valuations

Definition 8

We say a valuation ${\left\lvert - \right\rvert}$ is discrete if its image around ${1}$ is discrete, meaning that if ${\left\lvert a \right\rvert \in [1-\delta,1+\delta] \implies \left\lvert a \right\rvert = 1}$ for some real ${\delta}$. This is equivalent to requiring that ${\{\log\left\lvert a \right\rvert\}}$ is a discrete subgroup of the real numbers.

Thus, the real valuation (absolute value) isn’t discrete, while the ${p}$-adic one is.

4. Non-Archimedian Valuations

Most importantly:

Definition 9

A valuation ${\left\lvert - \right\rvert}$ is non-Archimedian if we can take ${C = 1}$ in our requirement that ${\left\lvert a \right\rvert \le 1 \implies \left\lvert 1+a \right\rvert \le C}$. Otherwise we say the valuation is Archimedian.

Thus the real valuation is Archimedian while the ${p}$-adic valuation is non-Archimedian.

Lemma 10

Given a non-Archimedian valuation ${\left\lvert - \right\rvert}$, we have ${\left\lvert b \right\rvert < \left\lvert a \right\rvert \implies \left\lvert a+b \right\rvert = \left\lvert a \right\rvert}$.

Proof: We have that

$\displaystyle \left\lvert a \right\rvert = \left\lvert (a+b)-b \right\rvert \le \max\left\{ \left\lvert a+b \right\rvert, \left\lvert b \right\rvert \right\}.$

On the other hand, ${\left\lvert a+b \right\rvert \le \max \{ \left\lvert a \right\rvert, \left\lvert b \right\rvert\}}$. $\Box$

Given a field ${k}$ and a non-Archimedian valuation on it, we can now consider the set

$\displaystyle \mathcal O = \left\{ a \in k \mid \left\lvert a \right\rvert \le 1 \right\}$

and by the previous lemma, this turns out to be a ring. (This is the point we use the fact that the valuation is non-Archimedian; without that ${\mathcal O}$ need not be closed under addition). Next, we define

$\displaystyle \mathcal P = \left\{ a \in k \mid \left\lvert a \right\rvert < 1 \right\} \subset \mathcal O$

which is an ideal. In fact it is maximal, because ${\mathcal O/\mathcal P}$ is the set of units in ${\mathcal O}$, and is thus necessarily a field.

Lemma 11

Two valuations are equal if they give the same ring ${\mathcal O}$ (as sets, not just up to isomorphism).

Proof: If the valuations are equivalent it’s trivial.

For the interesting converse direction (they have the same ring), the datum of the ring ${\mathcal O}$ lets us detect whether ${\left\lvert a \right\rvert < \left\lvert b \right\rvert}$ by simply checking whether ${\left\lvert ab^{-1} \right\rvert < 1}$. Hence same topology, hence same valuation. $\Box$

We will really only work with valuations which are obviously discrete. On the other hand, to detect non-Archimedian valuations, we have

Lemma 12

${\left\lvert - \right\rvert}$ is Archimedian if ${\left\lvert n \right\rvert \le 1}$ for every ${n = 1 + \dots + 1 \in k}$.

Proof: Clearly Archimedian ${\implies}$ ${\left\lvert n \right\rvert \le 1}$. The converse direction is more interesting; the proof is similar to the analytic trick we used earlier. Given ${\left\lvert a \right\rvert \le 1}$, we wish to prove ${\left\lvert 1+a \right\rvert \le 1}$. To do this, first assume the triangle inequality as usual, then

$\displaystyle \left\lvert 1+a \right\rvert^n < \sum_j \left\lvert \binom nj \right\rvert\left\lvert a \right\rvert^j \le \sum_{j=0}^n \left\lvert a \right\rvert^j \le \sum_{j=0}^n 1 = n+1.$

Finally, let ${n \rightarrow \infty}$ again. $\Box$
In particular, any field of finite characteristic in fact has ${\left\lvert n \right\rvert = 1}$ and thus all valuations are non-Archimedian.

5. Completions

We say that a field ${k}$ is complete with respect to a valuation ${\left\lvert - \right\rvert}$ if it is complete in the topological sense.

Theorem 13

Every field ${k}$ is with a valuation ${\left\lvert - \right\rvert}$ can be embedded into a complete field ${\overline{k}}$ in a way which respects the valuation.

For example, the completion of ${\mathbb Q}$ with the Euclidean valuation is ${\mathbb R}$. Proof: Define ${\overline{k}}$ to be the topological completion of ${k}$; then extend by continuity; $\Box$
Given ${k}$ and its completion ${\overline{k}}$ we use the same notation for the valuations of both.

Proposition 14

A valuation ${\left\lvert - \right\rvert}$ on ${\overline{k}}$ is non-Archimedian if and only if the valuation is non-Archimedian on ${k}$.

Proof: We saw non-Archimedian ${\iff}$ ${\left\lvert n \right\rvert \le 1}$ for every ${n = 1 + \dots + 1}$. $\Box$

Proposition 15

Assume ${\left\lvert - \right\rvert}$ is non-Archimedian on ${k}$ and hence ${\overline{k}}$. Then the set of values achieved by ${\left\lvert - \right\rvert}$ coincides for ${k}$ and ${\overline{k}}$, i.e. ${\{ \left\lvert k \right\rvert \} = \{ \left\lvert \overline{k} \right\rvert \}}$.

Not true for Archimedian valuations; consider ${\left\lvert \sqrt2 \right\rvert = \sqrt2 \notin \mathbb Q}$. Proof: Assume ${0 \neq b \in \overline{k}}$; then there is an ${a \in k}$ such that ${\left\lvert b-a \right\rvert < \left\lvert b \right\rvert}$ since ${k}$ is dense in ${\overline{k}}$. Then, ${\left\lvert b \right\rvert \le \max \{ \left\lvert b-a \right\rvert, \left\lvert a \right\rvert \}}$ which implies ${\left\lvert b \right\rvert = \left\lvert a \right\rvert}$. $\Box$

6. Weak Approximation Theorem

Proposition 16 (Weak Approximation Theorem)

Let ${\left\lvert-\right\rvert_i}$ be distinct nontrivial valuations of ${k}$ for ${i=1,\dots,n}$. Let ${k_i}$ denote the completion of ${k}$ with respect to ${\left\lvert-\right\rvert_i}$. Then the image

$\displaystyle k \hookrightarrow \prod_{i=1}^n k_i$

is dense.

This means that distinct valuations are as different as possible; for example, if ${\left\lvert-\right\rvert _1 = \left\lvert-\right\rvert _2}$ then we might get, say, a diagonal in ${\mathbb R \times \mathbb R}$ which is as far from dense as one can imagine. Another way to think of this is that this is an analogue of the Chinese Remainder Theorem.

Proof: We claim it suffices to exhibit ${\theta_i \in k}$ such that

$\displaystyle \left\lvert \theta_i \right\rvert_j \begin{cases} > 1 & i = j \\ < 1 & \text{otherwise}. \end{cases}$

Then

$\displaystyle \frac{\theta_i^r}{1+\theta_i^r} \rightarrow \begin{cases} 1 & \text{ in } \left\lvert-\right\rvert_i \\ 0 & \text{ otherwise}. \end{cases}$

Hence for any point ${(a_1, \dots, a_n)}$ we can take the image of ${\sum \frac{\theta_i^r}{1+\theta_i^r} a_i \in k}$. So it would follow that the image is dense.

Now, to construct the ${\theta_i}$ we proceed inductively. We first prove the result for ${n=2}$. Since the topologies are different, we exhibit ${\alpha}$, ${\beta}$ such that ${\left\lvert \alpha_1 \right\rvert < \left\lvert \alpha_2 \right\rvert}$ and ${\left\lvert \beta_1 \right\rvert > \left\lvert \beta_2 \right\rvert}$, and pick ${\theta=\alpha\beta^{-1}}$.

Now assume ${n \ge 3}$; it suffices to construct ${\theta_1}$. By induction, there is a ${\gamma}$ such that

$\displaystyle \left\lvert \gamma \right\rvert_1 > 1 \quad\text{and}\quad \left\lvert \gamma \right\rvert_i < 1 \text{ for } i = 2, \dots, n-1.$

Also, there is a ${\psi}$ such that

$\displaystyle \left\lvert \delta \right\rvert_1 > 1 \quad\text{and}\quad \left\lvert \delta \right\rvert_n < 1.$

Now we can pick

$\displaystyle \theta_1 = \begin{cases} \gamma & \left\lvert \gamma \right\rvert_n < 1 \\ \phi^r\gamma & \left\lvert \gamma \right\rvert_n = 1 \\ \frac{\gamma^r}{1+\gamma^r} & \left\lvert \gamma \right\rvert_n > 1 \\ \end{cases}$

for sufficiently large ${r}$. $\Box$

Linnik’s Theorem for Sato-Tate Laws on CM Elliptic Curves

\title{A Variant of Linnik for Elliptic Curves} \maketitle

Here I talk about my first project at the Emory REU. Prerequisites for this post: some familiarity with number fields.

1. Motivation: Arithemtic Progressions

Given a property ${P}$ about primes, there’s two questions we can ask:

1. How many primes ${\le x}$ are there with this property?
2. What’s the least prime with this property?

As an example, consider an arithmetic progression ${a}$, ${a+d}$, \dots, with ${a < d}$ and ${\gcd(a,d) = 1}$. The strong form of Dirichlet’s Theorem tells us that basically, the number of primes ${\equiv a \pmod d}$ is ${\frac 1d}$ the total number of primes. Moreover, the celebrated Linnik’s Theorem tells us that the first prime is ${O(d^L)}$ for a fixed ${L}$, with record ${L = 5}$.

As I talked about last time on my blog, the key ingredients were:

• Introducing Dirichlet characters ${\chi}$, which are periodic functions modulo ${q}$. One uses this to get the mod ${q}$ into the problem.
• Introducing an ${L}$-function ${L(s, \chi)}$ attached to ${\chi}$.
• Using complex analysis (Cauchy’s Residue Theorem) to boil the proof down to properties of the zeros of ${L(s, \chi)}$.

With that said, we now move to the object of interest: elliptic curves.

2. Counting Primes

Let ${E}$ be an elliptic curve over ${\mathbb Q}$, which for our purposes we can think of concretely as a curve in Weirestrass form

$\displaystyle y^2 = x^3 + Ax + B$

where the right-hand side has three distinct complex roots (viewed as a polynomial in ${x}$). If we are unlucky enough that the right-hand side has a double root, then the curve ceases to bear the name “elliptic curve” and instead becomes singular.

Here’s a natural number theoretic question: for any rational prime ${p}$, how many solutions does ${E}$ have modulo ${p}$?

To answer this it’s helpful to be able to think over an arbitrary field ${F}$. While we’ve written our elliptic curve ${E}$ as a curve over ${\mathbb Q}$, we could just as well regard it as a curve over ${\mathbb C}$, or as a curve over ${\mathbb Q(\sqrt 2)}$. Even better, since we’re interested in counting solutions modulo ${p}$, we can regard this as a curve over ${\mathbb F_p}$. To make this clear, we will use the notation ${E/F}$ to signal that we are thinking of our elliptic curve over the field ${F}$. Also, we write ${\#E(F)}$ to denote the number of points of the elliptic curve over ${F}$ (usually when ${F}$ is a finite field). Thus, the question boils down to computing ${\#E(\mathbb F_p)}$.

Anyways, the question above is given by the famous Hasse bound, and in fact it works over any number field!

Theorem 1 (Hasse Bound)

Let ${K}$ be a number field, and let ${E/K}$ be an elliptic curve. Consider any prime ideal ${\mathfrak p \subseteq \mathcal O_K}$ which is not ramified. Then we have

$\displaystyle \#E(\mathbb F_\mathfrak p) = \mathrm{N}\mathfrak p + 1 - a_\mathfrak p$

where ${\left\lvert a_\mathfrak p \right\rvert \le 2\sqrt{\mathrm{N}\mathfrak p}}$.

Here ${\mathbb F_\mathfrak p = \mathcal O_K / \mathfrak p}$ is the field of ${\mathrm{N}\mathfrak p}$ elements. The extra “${+1}$” comes from a point at infinity when you complete the elliptic curve in the projective plane.

Here, the ramification means what you might guess. Associated to every elliptic curve over ${\mathbb Q}$ is a conductor ${N}$, and a prime ${p}$ is ramified if it divides ${N}$. The finitely many ramified primes are the “bad” primes for which something breaks down when we take modulo ${p}$ (for example, perhaps the curve becomes singular).

In other words, for the ${\mathbb Q}$ case, except for finitely many bad primes ${p}$, the number of solutions is ${p + 1 + O(\sqrt p)}$, and we even know the implied ${O}$-constant to be ${2}$.

Now, how do we predict the error term?

3. The Sato-Tate Conjecture

For elliptic curves over ${\mathbb Q}$, we the Sato-Tate conjecture (which recently got upgraded to a theorem) more or less answers the question. But to state it, I have to introduce a new term: an elliptic curve ${E/\mathbb Q}$, when regarded over ${\mathbb C}$, can have complex multiplication (abbreviated CM). I’ll define this in just a moment, but for now, the two things to know are

• CM curves are “special cases”, in the sense that a randomly selected elliptic curve won’t have CM.
• It’s not easy in general to tell whether a given elliptic curve has CM.

Now I can state the Sato-Tate result. It is most elegantly stated in terms of the following notation: if we define ${a_p = p + 1 - \#E(\mathbb F_p)}$ as above, then there is a unique ${\theta_p \in [0,\pi]}$ which obeys

$\displaystyle a_p = 2 \sqrt p \cos \theta_p.$

Theorem 2 (Sato-Tate)

Fix an elliptic curve ${E/\mathbb Q}$ which does not have CM (when regarded over ${\mathbb C}$). Then as ${p}$ varies across unramified primes, the asymptotic probability that ${\theta_p \in [\alpha, \beta]}$ is

$\displaystyle \frac{2}{\pi} \int_{[\alpha, \beta]} \sin^2\theta_p.$

In other words, ${\theta_p}$ is distributed according to the measure ${\sin^2\theta}$.

Now, what about the CM case?

4. CM Elliptic Curves

Consider an elliptic curve ${E/\mathbb Q}$ but regard it as a curve over ${\mathbb C}$. It’s well known that elliptic curves happen to have a group law: given two points on an elliptic curve, you can add them to get a third point. (If you’re not familiar with this, Wikipedia has a nice explanation). So elliptic curves have more structure than just their set of points: they form an abelian group; when written in Weirerstrass form, the identity is the point at infinity.

Letting ${A = (A, +)}$ be the associated abelian group, we can look at the endomorphisms of ${E}$ (that is, homomorphisms ${A \rightarrow A}$). These form a ring, which we denote ${\text{End }(E)}$. An example of such an endomorphism is ${a \mapsto n \cdot a}$ for an integer ${n}$ (meaning ${a+\dots+a}$, ${n}$ times). In this way, we see that ${\mathbb Z \subseteq \text{End }(E)}$.

Most of the time we in fact have ${\text{End }(E) \cong \mathbb Z}$. But on occasion, we will find that ${\text{End }(E)}$ is congruent to ${\mathcal O_K}$, the ring of integers of a number field ${K}$. This is called complex multiplication by ${K}$.

Intuitively, this CM is special (despite being rare), because it means that the group structure associated to ${E}$ has a richer set of symmetry. For CM curves over any number field, for example, the Sato-Tate result becomes very clean, and is considerably more straightforward to prove.

Here’s an example. The elliptic curve

$\displaystyle E : y^2 = x^3 - 17 x$

of conductor ${N = 2^6 \cdot 17^2}$ turns out to have

$\displaystyle \text{End }(E) \cong \mathbb Z[i]$

i.e. it has complex multiplication has ${\mathbb Z[i]}$. Throwing out the bad primes ${2}$ and ${17}$, we compute the first several values of ${a_p}$, and something bizarre happens. For the ${3}$ mod ${4}$ primes we get

\displaystyle \begin{aligned} a_{3} &= 0 \\ a_{7} &= 0 \\ a_{11} &= 0 \\ a_{19} &= 0 \\ a_{23} &= 0 \\ a_{31} &= 0 \end{aligned}

and for the ${1}$ mod ${4}$ primes we have

\displaystyle \begin{aligned} a_5 &= 4 \\ a_{13} &= 6 \\ a_{29} &= 4 \\ a_{37} &= 12 \\ a_{41} &= -8 \end{aligned}

Astonishingly, the vanishing of ${a_p}$ is controlled by the splitting of ${p}$ in ${\mathbb Z[i]}$! In fact, this holds more generally. It’s a theorem that for elliptic curves ${E/\mathbb Q}$ with CM, we have ${\text{End }(E) \cong \mathcal O_K}$ where ${K}$ is some quadratic imaginary number field which is also a PID, like ${\mathbb Z[i]}$. Then ${\mathcal O_K}$ governs how the ${a_p}$ behave:

Theorem 3 (Sato-Tate Over CM)

Let ${E/\mathbb Q}$ be a fixed elliptic curve with CM by ${\mathcal O_K}$. Let ${\mathfrak p}$ be a unramified prime of ${\mathcal O_K}$.

1. If ${\mathfrak p}$ is inert, then ${a_\mathfrak p = 0}$ (i.e. ${\theta_\mathfrak p = \frac{1}{2}\pi}$).
2. If ${\mathfrak p}$ is split, then ${\theta_\mathfrak p}$ is uniform across ${[0, \pi]}$.

I’m told this is much easier to prove than the usual Sato-Tate.

But there’s even more going on in the background. If I look again at ${a_p}$ where ${p \equiv 1 \pmod 4}$, I might recall that ${p}$ can be written as the sum of squares, and construct the following table:

$\displaystyle \begin{array}{rrl} p & a_p & x^2+y^2 \\ 5 & 4 & 2^2 + 1^2 \\ 13 & 6 & 3^2 + 2^2 \\ 29 & 4 & 2^2 + 5^2 \\ 37 & 12 & 6^2 + 1^2 \\ 41 & -8 & 4^2 + 5^2 \\ 53 & 14 & 7^2 + 2^2 \\ 61 & 12 & 6^2 + 5^2 \\ 73 & -16 & 8^2 + 3^2 \\ 89 & -10 & 5^2 + 8^2 \\ \end{array}$

Each ${a_p}$ is double one of the terms! There is no mistake: the ${a_p}$ are also tied to the decomposition of ${p = x^2+y^2}$. And this works for any number field.

What’s happening? The main idea is that looking at a prime ideal ${\mathfrak p = (x+yi)}$, ${a_\mathfrak p}$ is related to the argument of the complex number ${x+yi}$ in some way. Of course, there are lots of questions unanswered (how to pick the ${\pm}$ sign, and which of ${x}$ and ${y}$ to choose) but there’s a nice way to package all this information, as I’ll explain momentarily.

(Aside: I think the choice of having ${x}$ be the odd or even number depends precisely on whether ${p}$ is a quadratic residue modulo ${17}$, but I’ll have to check on that.)

5. ${L}$-Functions

I’ll just briefly explain where all this is coming from, and omit lots of details (in part because I don’t know all of them). Let ${E/\mathbb Q}$ be an elliptic curve with CM by ${\mathcal O_K}$. We can define an associated ${L}$-function

$\displaystyle L(s, E/K) = \prod_\mathfrak p \left( 1 - \frac{a_\mathfrak p}{(\mathrm{N}\mathfrak p)^{s+\frac{1}{2}}} + \frac{1}{(\mathrm{N}\mathfrak p)^{2s}} \right)$

(actually this isn’t quite true actually, some terms change for ramified primes ${\mathfrak p}$).

At the same time there’s a notion of a Hecke Grössencharakter ${\xi}$ on a number field ${K}$ — a higher dimensional analog of the Dirichlet charaters we used on ${\mathbb Z}$ to filter modulo ${q}$. For our purposes, think of it as a multiplicative function which takes in ideals of ${\mathcal O_K}$ and returns complex numbers of norm ${1}$. Like Dirichlet characters, each ${\xi}$ gets a Hecke ${L}$-function

$\displaystyle L(s, \xi) = \prod_\mathfrak p \left( 1 - \frac{\xi(\mathfrak p)}{(\mathrm{N}\mathfrak p)^s} \right)$

which again extends to a meromorphic function on the entire complex plane.

Now the great theorem is:

Theorem 4 (Deuring)

Let ${E/\mathbb Q}$ have CM by ${\mathcal O_K}$. Then

$\displaystyle L(s,E/K) = L(s, \xi)L(s, \overline{\xi})$

for some Hecke Grössencharakter ${\xi}$.

Using the definitions given above and equating the Euler products at an unramified ${\mathfrak p}$ gives

$\displaystyle 1 - \frac{a_\mathfrak p}{(\mathrm{N}\mathfrak p)^{s+\frac{1}{2}}} + \frac{1}{(\mathrm{N}\mathfrak p)^{2s}} = \left( 1 - \frac{\xi(\mathfrak p)}{(\mathrm{N}\mathfrak p)^s} \right) \left( 1 - \frac{\overline{\xi(\mathfrak p)}}{(\mathrm{N}\mathfrak p)^s} \right)$

Upon recalling that ${a_\mathfrak p = 2 \sqrt{\mathrm{N}\mathfrak p} \cos \theta_\mathfrak p}$, we derive

$\displaystyle \xi(\mathfrak p) = \exp(\pm i \theta_\mathfrak p).$

This is enough to determine the entire ${\xi}$ since ${\xi}$ is multiplicative.

So this is the result: let ${E/\mathbb Q}$ be an elliptic curve of conductor ${N}$. Given our quadratic number field ${K}$, we define a map ${\xi}$ from prime ideals of ${\mathcal O_K}$ to the unit circle in ${\mathbb C}$ by

$\displaystyle \mathfrak p \mapsto \begin{cases} \exp(\pm i \theta_\mathfrak p) & \gcd(\mathrm{N}\mathfrak p, N) = 1 \\ 0 & \gcd(\mathrm{N}\mathfrak p, N) > 1. \end{cases}$

Thus ${\xi}$ is a Hecke Grössencharakter for some choice of ${\pm}$ at each ${\mathfrak p}$.

It turns out furthermore that ${\xi}$ has frequency ${1}$, which roughly means that the argument of ${\xi\left( (\pi) \right)}$ is related to ${1}$ times the argument of ${\pi}$ itself. This fact is what explains the mysterious connection between the ${a_p}$ and the solutions above.

6. Linnik-Type Result

With this in mind, I can now frame the main question: suppose we have an interval ${[\alpha, \beta] \subset [0,\pi]}$. What’s the first prime ${p}$ such that ${\theta_p \in [\alpha, \beta]}$? We’d love to have some analog of Linnik’s Theorem here.

This was our project and the REU, and Ashvin, Peter and I proved that

Theorem 5

If a rational ${E}$ has CM then the least prime ${p}$ with ${\theta_p \in [\alpha,\beta]}$ is

$\displaystyle \ll \left( \frac{N}{\beta-\alpha} \right)^A.$

I might blog later about what else goes into the proof of this. . . but Deuring’s result is one key ingredient, and a proof of an analogous theorem for non-CM curves would have to be very different.

Proof of Dirichlet’s Theorem on Arithmetic Progressions

In this post I will sketch a proof Dirichlet Theorem’s in the following form:

Theorem 1 (Dirichlet’s Theorem on Arithmetic Progression)

Let

$\displaystyle \psi(x;q,a) = \sum_{\substack{n \le x \\ n \equiv a \mod q}} \Lambda(n).$

Let ${N}$ be a positive constant. Then for some constant ${C(N) > 0}$ depending on ${N}$, we have for any ${q}$ such that ${q \le (\log x)^N}$ we have

$\displaystyle \psi(x;q,a) = \frac{1}{\phi(q)} x + O\left( x\exp\left(-C(N) \sqrt{\log x}\right) \right)$

uniformly in ${q}$.

Prerequisites: complex analysis, previous two posts, possibly also Dirichlet characters. It is probably also advisable to read the last chapter of Hildebrand first, since this contains a much more thorough version of an easier version in which the zeros of ${L}$-functions are less involved.

Warning: I really don’t understand what I am saying. It is at least 50% likely that this post contains a major error, and 90% likely that there will be multiple minor errors. Please kindly point out any screw-ups of mine; thanks!

Throughout this post: ${s = \sigma + it}$ and ${\rho = \beta + i \gamma}$, as always. All ${O}$-estimates have absolute constants unless noted otherwise, and ${A \ll B}$ means ${A = O(B)}$, ${A \asymp B}$ means ${A \ll B \ll A}$. By abuse of notation, ${\mathcal L}$ will be short for either ${\log q \left( \left\lvert t \right\rvert + 2 \right)}$ or ${\log q \left( \left\lvert T \right\rvert + 2 \right)}$, depending on context.

1. Outline

Here are the main steps:

1. We introduce Dirichlet character ${\chi : \mathbb N \rightarrow \mathbb C}$ which will serves as a roots of unity filter, extracting terms ${\equiv a \pmod q}$. We will see that this reduces the problem to estimating the function ${\psi(x,\chi) = \sum_{n \le x} \chi(n) \Lambda(n)}$.
• Introduce the ${L}$-function ${L(s, \chi)}$, the generalization of ${\zeta}$ for arithmetic progressions. Establish a functional equation in terms of ${\xi(\chi,s)}$, much like with ${\zeta}$, and use it to extend ${L(s,\chi)}$ to a meromorphic function in the entire complex plane.
• We will use a variation on the Perron transformation in order to transform this sum into an integral involving an ${L}$-function ${L(\chi,s)}$. We truncate this integral to ${[c-iT, c+iT]}$; this introduces an error ${E_{\text{truncate}}}$ that can be computed immediately, though in this presentation we delay its computation until later.
• We do a contour as in the proof of the Prime Number Theorem in order to estimate the above integral in terms of the zeros of ${L(\chi, s)}$. The main term emerges as a residue, so we want to show that the integral ${E_{\text{contour}}}$ along this integral goes to zero. Moreover, we get some residues ${\sum_\rho \frac{x^\rho}{\rho}}$ related to the zeros of the ${L}$-function.
• By using Hadamard’s Theorem on ${\xi(\chi,s)}$ which is entire, we can write ${\frac{L'}{L}(s,\chi)}$ in terms of its zeros. This has three consequences:
1. We can use the previous to get bounds on ${\frac{L'}{L}(s, \chi)}$.
2. Using a 3-4-1 trick, this gives us information on the horizontal distribution of ${\rho}$; the dreaded Siegel zeros appear here.
3. We can get an expression which lets us estimate the vertical distribution of the zeros in the critical strip (specifically the number of zeros with ${\gamma \in [T-1, T+1]}$).

The first and third points let us compute ${E_{\text{contour}}}$.

• The horizontal zero-free region gives us an estimate of ${\sum_\rho \frac{x^\rho}{\rho}}$, which along with ${E_{\text{contour}}}$ and ${E_{\text{truncate}}}$ gives us the value of ${\psi(x,\chi)}$.
• We use Siegel’s Theorem to handle the potential Siegel zero that might arise.

The pink dots denote zeros; we think the nontrivial ones all lie on the half-line by the Generalized Riemann Hypothesis but they could actually be anywhere in the green strip.

2. Dirichlet Characters

2.1. Definitions

Recall that a Dirichlet character ${\chi}$ modulo ${q}$ is a completely multiplicative function ${\chi : \mathbb N \rightarrow \mathbb C}$ which is also periodic modulo ${q}$, and vanishes for all ${n}$ with ${\gcd(n,q) > 1}$. The trivial character (denoted ${\chi_0}$) is defined by ${\chi_0(n) = 1}$ when ${\gcd(n,q)=1}$ and ${\chi_0(n) = 0}$ otherwise.

In particular, ${\chi(1)=1}$ and thus each nonzero ${\chi}$ value is a ${\phi(q)}$-th primitive root of unity; there are also exactly ${\phi(q)}$ Dirichlet characters modulo ${q}$. Observe that ${\chi(-1)^2 = \chi(1) = 1}$, so ${\chi(-1) = \pm 1}$. We shall call ${\chi}$ even if ${\chi(1) = +1}$ and odd otherwise.

If ${\tilde q \mid q}$, then a character ${\tilde\chi}$ modulo ${\tilde q}$ induces a character ${\chi}$ modulo ${q}$ in a natural way: let ${\chi = \tilde\chi}$ except at the points where ${\gcd(n,q)>1}$ but ${\gcd(n,\tilde q)=1}$, letting ${\chi}$ be zero at these points instead. (In effect, we are throwing away information about ${\tilde\chi}$.) A character ${\chi}$ not induced by any smaller character is called primitive.

2.2. Orthogonality

The key fact about Dirichlet characters which will enable us to prove the theorem is the following trick:

Theorem 2 (Orthogonality of Dirichlet Characters)

We have

$\displaystyle \sum_{\chi \mod q} \chi(a) \overline{\chi}(b) = \begin{cases} \phi(q) & \text{ if } a \equiv b \pmod q, \gcd(a,q) = 1 \\ 0 & \text{otherwise}. \end{cases}$

(Here ${\overline{\chi}}$ is the conjugate of ${\chi}$, which is essentially a multiplicative inverse.)

This is in some senses a slightly fancier form of the old roots of unity filter. Specifically, it is not too hard to show that ${\sum_{\chi} \chi(n)}$ vanishes for ${n \not\equiv 1 \pmod q}$ while it is equal to ${\phi(q)}$ for ${n \equiv 1 \pmod q}$.

2.3. Dirichlet ${L}$-Functions

Now we can define the associated ${L}$-function by

$\displaystyle L(\chi, s) = \sum_{n \ge 1} \chi(n) n^{-s} = \prod_p \left( 1-\chi(p) p^{-s} \right)^{-1}.$

The properties of these ${L}$-functions are that

Theorem 3

Let ${\chi}$ be a Dirichlet character modulo ${q}$. Then

1. If ${\chi \ne \chi_0}$, ${L(\chi, s)}$ can be extended to a holomorphic function on ${\sigma > 0}$.
2. If ${\chi = \chi_0}$, ${L(\chi, s)}$ can be extended to a meromorphic function on ${\sigma > 0}$, with a single simple pole at ${s=1}$ of residue ${\phi(q) / q}$.

The proof is pretty much the same as for zeta.

Observe that if ${q=1}$, then ${L(\chi, s) = \zeta(s)}$.

2.4. The Functional Equation for Dirichlet ${L}$-Functions

While I won’t prove it here, one can show the following analog of the functional equation for Dirichlet ${L}$-functions.

Theorem 4 (The Functional Equation of Dirichlet ${L}$-Functions)

Assume that ${\chi}$ is a character modulo ${q}$, possibly trivial or imprimitive. Let ${a=0}$ if ${\chi}$ is even and ${a=1}$ if ${\chi}$ is odd. Let

$\displaystyle \xi(s,\chi) = q^{\frac{1}{2}(s+a)} \gamma(s,\chi) L(s,\chi) \left[ s(1-s) \right]^{\delta(x)}$

where

$\displaystyle \gamma(s,\chi) = \pi^{-\frac{1}{2}(s+a)} \Gamma\left( \frac{s+a}{2} \right)$

and ${\delta(\chi) = 1}$ if ${\chi = \chi_0}$ and zero otherwise. Then

1. ${\xi}$ is entire.
2. If ${\chi}$ is primitive, then ${\xi(s,\chi) = W(\chi)\xi(1-s, \overline{\chi})}$ for some complex number ${\left\lvert W(\chi) \right\rvert = 1}$.

Unlike the ${\zeta}$ case, the ${W(\chi)}$ is nastier to describe; computing it involves some Gauss sums that would be too involved for this post. However, I should point out that it is the Gauss sum here that requires ${\chi}$ to be primitive. As before, ${\xi}$ gives us an meromorphic continuation of ${L(\chi, s)}$ in the entire complex plane. We obtain trivial zeros of ${L(\chi, s)}$ as follows:

• For ${\chi}$ even, we get zeros at ${-2}$, ${-4}$, ${-6}$ and so on.
• For ${\chi \neq \chi_0}$ even, we get zeros at ${0}$, ${-2}$, ${-4}$, ${-6}$ and so on (since the pole of ${\Gamma(\frac{1}{2} s)}$ at ${s=0}$ is no longer canceled).
• For ${\chi}$ odd, we get zeros at ${-1}$, ${-3}$, ${-5}$ and so on.

3. Obtaining the Contour Integral

3.1. Orthogonality

Using the trick of orthogonality, we may write

\displaystyle \begin{aligned} \psi(x;q,a) &= \sum_{n \le x} \frac{1}{\phi(q)} \sum_{\chi \mod q} \chi(n)\overline{\chi}(a) \Lambda(n) \\ &= \frac{1}{\phi(q)} \sum_{\chi \mod q} \overline{\chi}(a) \left( \sum_{n \le x} \chi(n) \Lambda(n) \right). \end{aligned}

To do this we have to estimate the sum ${\sum_{n \le x} \chi(n) \Lambda(n)}$.

3.2. Introducing the Logarithmic Derivative of the ${L}$-Function

First, we realize ${\chi(n) \Lambda(n)}$ as the coefficients of a Dirichlet series. Recall last time we saw that ${-\frac{\zeta'}{\zeta}}$ gave ${\Lambda}$ as coefficients. We can do the same thing with ${L}$-functions: put

$\displaystyle \log L(s, \chi) = -\sum_p \log \left( 1 - \chi(p) p^{-s} \right).$

Taking the derivative, we obtain

Theorem 5

For any ${\chi}$ (possibly trivial or imprimitive) we have

$\displaystyle -\frac{L'}{L}(s, \chi) = \sum_{n \ge 1} \Lambda(n) \chi(n) n^{-s}.$

Proof:

\displaystyle \begin{aligned} -\frac{L'}{L}(s, \chi) &= \sum_p \frac{\log p}{1-\chi(p) p^{-s}} \\ &= \sum_p \log p \cdot \sum_{m \ge 1} \chi(p^m) (p^m)^{-s} \\ &= \sum_{n \ge 1} \Lambda(n) \chi(n) n^{-s} \end{aligned}

as desired. $\Box$

3.3. The Truncation Trick

Now, we unveil the trick at the heart of the proof of Perron’s Formula in the last post. I will give a more precise statement this time, by stating where this integral comes from:

Lemma 6 (Truncated Version of Perron Lemma)

For any ${c,y,T > 0}$ define

$\displaystyle I(y,T) = \frac{1}{2\pi i} \int_{c-iT}^{c+iT} \frac{y^s}{s} \; ds$

Then ${I(y,T) = \delta(y) + E(y,T)}$ where ${\delta(y)}$ is the indicator function defined by

$\displaystyle \delta(y) = \begin{cases} 0 & 0 < y < 1 \\ \frac{1}{2} & y=1 \\ 1 & y > 1 \end{cases}$

and the error term ${E(y,T)}$ is given by

$\displaystyle \left\lvert E(y,T) \right\rvert < \begin{cases} y^c \min \left\{ 1, \frac{1}{T \left\lvert \log y \right\rvert} \right\} & y \neq 1 \\ cT^{-1} & y=1. \end{cases}$

In particular, ${I(y,\infty) = \delta(y)}$.

In effect, the integral from ${c-iT}$ to ${c+iT}$ is intended to mimic an indicator function. We can use it to extract the terms of the Dirichlet series of ${-\frac{L'}{L}(s, \chi)}$ which happen to have ${n \le x}$, by simply appealing to ${\delta(x/n)}$. Unfortunately, we cannot take ${T = \infty}$ because later on this would introduce a sum which is not absolutely convergent, meaning we will have to live with the error term introduced by picking a particular finite value of ${T}$.

3.4. Applying the Truncation

Let’s do so: define

$\displaystyle \psi(x;\chi) = \sum_{n \ge 1} \delta\left( x/n \right) \Lambda(n) \chi(n)$

which is almost the same as ${\sum_{n \le x} \Lambda(n) \chi(n)}$, except that if ${x}$ is actually an integer then ${\Lambda(x)\chi(x)}$ should be halved (since ${\delta(\frac{1}{2}) = \frac{1}{2}}$). Now, we can substitute in our integral representation, and obtain

\displaystyle \begin{aligned} \psi(x;\chi) &= \sum_{n \ge 1} \Lambda(n) \chi(n) \cdot \left( E(x/n,T) + \int_{c-iT}^{c+iT} \frac{(x/n)^s}{s} \; ds \right) \\ &= \sum_{n \ge 1} \Lambda(n) \chi(n) \cdot E(x/n, T) + \int_{c-iT}^{c+iT} \sum_{n \ge 1} \left( \Lambda(n)\chi(n) n^{-s} \right) \frac{x^s}{s} \; ds \\ &= E_{\text{truncate}} + \int_{c-iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \end{aligned}

where

$\displaystyle E_{\text{truncate}} = \sum_{n \ge 1} \Lambda(n) \chi(n) \cdot E(x/n, T).$

Estimating this is quite ugly, so we defer it to later.

4. Applying the Residue Theorem

4.1. Primitive Characters

Exactly like before, we are going to use a contour to estimate the value of

$\displaystyle \int_{c-iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds.$

Let ${U}$ be a large half-integer (so no zeros of ${L(\chi,s)}$ with ${\text{Re } s = U}$). We then re-route the integration path along the contour integral

$\displaystyle c-iT \rightarrow -U-iT \rightarrow -U+iT \rightarrow c+iT.$

During this process we pick up residues, which are the interesting terms.

First, assume that ${\chi}$ is primitive, so the functional equation applies and we get the information we want about zeros.

• If ${\chi = \chi_0}$, then so we pick up a residue of ${+x}$ corresponding to

$\displaystyle (-1) \cdot -x^1/1 = +x.$

This is the “main term”. Per laziness, ${\delta(\chi) x}$ it is.

• Depending on whether ${\chi}$ is odd or even, we detect the trivial zeros, which we can express succinctly by

$\displaystyle \sum_{m \ge 1} \frac{x^{a-2m}}{2m-a}$

Actually, I really ought to truncate this at ${U}$, but since I’m going to let ${U \rightarrow \infty}$ in a moment I really don’t want to take the time to do so; the difference is negligible.

• We obtain a residue of ${-\frac{L'}{L}(s, \chi)}$ at ${s = 0}$, which we denote ${b(\chi)}$, for ${s=0}$. Observe that if ${\chi}$ is even, this is the constant term of ${-\frac{L'}{L}(s, \chi)}$ near ${s=0}$ (but there is a pole of the whole function at ${s=0}$); otherwise it equals the value of ${-\frac{L'}{L}(0, \chi)}$ straight-out.
• If ${\chi \ne \chi_0}$ is even then ${L(s, \chi)}$ itself has a zero, so we are in worse shape. We recall that

$\displaystyle \frac{L'}{L}(s, \chi) = \frac 1 s + b(\chi) + \dots$

and notice that

$\displaystyle \frac{x^s}{s} = \frac 1s + \log x + \dots$

so we pick up an extra residue of ${-\log x}$. So, call this a bonus of ${-(1-a) \log x}$

• Finally, the hard-to-understand zeros in the strip ${0 < \sigma < 1}$. If ${\rho = \beta+i\gamma}$ is a zero, then it contributes a residue of ${-\frac{x^\rho}{\rho}}$. We only pick up the zeros with ${\left\lvert \gamma \right\rvert < T}$ in our rectangle, so we get a term

$\displaystyle -\sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}.$

Letting ${U \rightarrow \infty}$ we derive that

\displaystyle \begin{aligned} &\phantom= \int_{c-iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \\ &= \delta(\chi) x + E_{\text{contour}} + \sum_{m \ge 1} \frac{x^{a-2m}}{2m-a} - b(\chi) - (1-a) \log x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} \end{aligned}

at least for primitive characters. Note that the sum over the zeros is not absolutely convergent without the restriction to ${\left\lvert \gamma \right\rvert < T}$ (with it, the sum becomes a finite one).

4.2. Transition to nonprimitive characters

The next step is to notice that if ${\chi}$ modulo ${q}$ happens to be not primitive, and is induced by ${\tilde\chi}$ with modulus ${\tilde q}$, then actually ${\psi(x,\chi)}$ and ${\psi(x,\tilde\chi)}$ are not so different. Specifically, they differ by at most

\displaystyle \begin{aligned} \left\lvert \psi(x,\chi)-\psi(x,\tilde\chi) \right\rvert &\le \sum_{\substack{\gcd(n,\tilde q)=1 \\ \gcd(n,q) > 1 \\ n \le x}} \Lambda(n) \\ &\le \sum_{\substack{\gcd(n,q) > 1 \\ n \le x}} \Lambda(n) \\ &\le \sum_{p \mid q} \sum_{\substack{p^k \le x}} \log p \\ &\le \sum_{p \mid q} \log x \\ &\le (\log q)(\log x) \end{aligned}

and so our above formula in fact holds for any character ${\chi}$, if we are willing to add an error term of ${(\log q)(\log x)}$. This works even if ${\chi}$ is trivial, and also ${\tilde q \le q}$, so we will just simplify notation by omitting the tilde’s.

Anyways ${(\log q)(\log x)}$ is piddling compared to all the other error terms in the problem, and we can swallow a lot of the boring residues into a new term, say

$\displaystyle E_{\text{tiny}} \le (\log q + 1)(\log x) + 2.$

Thus we have

$\displaystyle \psi(x, \chi) = \delta(\chi) x + E_{\text{contour}} + E_{\text{truncate}} + E_{\text{tiny}} - b(\chi) - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}.$

Unfortunately, the constant ${b(\chi)}$ depends on ${\chi}$ and cannot be absorbed. We will also estimate ${E_{\text{contour}}}$ in the error term party.

5. Distribution of Zeros

In order to estimate

$\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}$

we will need information on both the vertical and horizontal distribution of the zeros. Also, it turns out this will help us compute ${E_{\text{contour}}}$.

Let ${\chi}$ be primitive modulo ${q}$. As we saw,

$\displaystyle \xi(s,\chi) = (q/\pi)^{\frac{1}{2} s + \frac{1}{2} a} \Gamma\left( \frac{s+a}{2} \right) L(s, \chi) \left( s(1-s) \right)^{\delta(\chi)}$

is entire. It also is easily seen to have order ${1}$, since no term grows much more than exponentially in ${s}$ (using Stirling to handle the ${\Gamma}$ factor). Thus by Hadamard, we may put

$\displaystyle \xi(s, \chi) = e^{A(\chi)+B(\chi)z} \prod_\rho \left( 1-\frac{z}{\rho} \right) e^{\frac{z}{\rho}}.$

Taking a logarithmic derivative and cleaning up, we derive the following lemma.

Lemma 7 (Hadamard Expansion of Logarithmic Derivative)

For any primitive character ${\chi}$ (possibly trivial) we have

\displaystyle \begin{aligned} -\frac{L'}{L}(s, \chi) &= \frac{1}{2} \log\frac{q}{\pi} + \frac{1}{2}\frac{\Gamma'(\frac{1}{2} s + \frac{1}{2} a)}{\Gamma(\frac{1}{2} s + \frac{1}{2} a)} \\ &- B(\chi) - \sum_{\rho} \left( \frac{1}{s-\rho} + \frac{1}{\rho} \right) + \delta(\chi) \cdot \left( \frac{1}{s-1} + \frac 1s \right). \end{aligned}

Proof: One one hand, we have

$\displaystyle \log \xi(s, \chi) = A(\chi) + B(\chi) s + \sum_\rho \left( \log \left( 1-\frac{s}{\rho} \right) + \frac{s}{\rho} \right).$

On the other hand

$\displaystyle \log \xi(s, \chi) = \frac{s+a}{2} \cdot \log \frac{q}{\pi} + \log \Gamma\left( \frac{s+a}{2} \right) + \log L(s, \chi) + \delta\chi(\log s + \log (1-s)).$

Taking the derivative of both sides and setting them equal: we have on the left side

$\displaystyle B(\chi) + \sum_{\rho} \left( \frac{1}{1-\frac{s}{\rho}} \cdot \frac{1}{-\rho} + \frac{1}{\rho} \right) = B(\chi) + \sum_\rho \left( \frac{1}{s-\rho} + \frac{1}{\rho} \right)$

and on the right-hand side

$\displaystyle \frac{1}{2} \log\frac{q}{\pi} + \frac{1}{2}\frac{\Gamma'}{\Gamma}\left( \frac{s+a}{2} \right) + \frac{L'}{L} (s, \chi) + \delta_\chi \left( \frac 1s + \frac{1}{s-1} \right).$

Equating these gives the desired result. $\Box$

This will be useful in controlling things later. The ${B(\chi)}$ is a constant that turns out to be surprisingly annoying; it is tied to ${b(\chi)}$ from the contour, so we will need to deal with it.

5.2. A Bound on the Logarithmic Derivative

Frequently we will take the real part of this. Using Stirling, the short version of this is:

Lemma 8 (Logarithmic Derivative Bound)

Let ${\sigma \ge 1}$ and ${\chi}$ be primitive (possibly trivial). Then

$\displaystyle \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] = \begin{cases} O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} + \text{Re } \frac{\delta(\chi)}{s-1} & 1 \le \sigma \le 2 \\ O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} & 1 \le \sigma \le 2, \left\lvert t \right\rvert \ge 2 \\ O(1) & \sigma \ge 2. \end{cases}$

Proof: The claim is obvious for ${\sigma \ge 2}$, since we can then bound the quantity by ${\frac{\zeta'(\sigma)}{\zeta(\sigma)} \le \frac{\zeta'(2)}{\zeta(2)}}$ due to the fact that the series representation is valid in that range. The second part with ${\left\lvert t \right\rvert \ge 2}$ follows from the first line, by noting that ${\text{Re } \frac{1}{s-1} < 1}$. So it just suffices to show that

$\displaystyle O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} + \text{Re } \frac{\delta(\chi)}{s-1}$

where ${1 \le \sigma \le 2}$ and ${\chi}$ is primitive.

First, we claim that ${\text{Re } B(\chi) = - \text{Re } \sum \frac{1}{\rho}}$. We use the following trick:

$\displaystyle B(\chi) = \frac{\xi'(0,\chi)}{\xi(0,\chi)} = -\frac{\xi'(1,\overline{\chi})}{\xi(1,\overline{\chi})} = \overline{B(\chi)} - \sum_{\overline{\rho}} \left( \frac{1}{1-\overline{\rho}} + \frac{1}{\overline{\rho}} \right) + \frac{\delta(\chi)}{s-1}$

where the ends come from taking the logarithmic derivative directly. By switching ${1-\overline{\rho}}$ with ${\rho}$, the claim follows.

Then, the lemma follows rather directly; the ${\text{Re } \sum_\rho \frac{1}{\rho}}$ has miraculously canceled with ${\text{Re } B(\chi)}$. To be explicit, we now have

$\displaystyle - \text{Re } \frac{L'(s, \chi)}{L(s, \chi)} = \frac{1}{2} \log\frac{q}{\pi} + \frac{1}{2} \text{Re } \frac{\Gamma'(\frac{1}{2} s + \frac{1}{2} a)}{\Gamma(\frac{1}{2} s + \frac{1}{2} a)} - \sum_{\rho} \text{Re } \frac{1}{s-\rho} + \frac{\delta(\chi)}{s} + \frac{\delta(\chi)}{s-1}$

and the first two terms contribute ${\log q}$ and ${\log t}$, respectively; meanwhile the term ${\frac{\delta(\chi)}{s}}$ is at most ${1}$, so it is absorbed. $\Box$

Short version: our functional equation lets us relate ${L(s, \chi)}$ to ${L(1-s, \chi)}$ for ${\sigma \le 0}$ (in fact it’s all we have!) so this gives the following corresponding estimate:

Lemma 9 (Far-Left Estimate of Log Derivative)

If ${\sigma \le -1}$ and ${t \ge 2}$ we have

$\displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = O\left[ \log q\left\lvert s \right\rvert \right].$

Proof:

We have

$\displaystyle L(1-s, \chi) = W(\chi) 2^{1-s} \pi^{-s} q^{s-\frac{1}{2}} \cos \frac{1}{2} \pi (s-a) \Gamma(s) L(s, \overline{\chi})$

(the unsymmetric functional equation, which can be obtained from Legendre’s duplication formula). Taking a logarithmic derivative yields

$\displaystyle \frac{L'}{L}(s, \chi) = \log \frac{q}{2\pi} - \frac{1}{2} \pi \tan \frac{1}{2} \pi(1-s-a) + \frac{\Gamma'}{\Gamma}(1-s) + \frac{L'}{L}(1-s, \overline{\chi}).$

Because we assumed ${\left\lvert t \right\rvert \ge 2}$, the tangent function is bounded as ${s}$ is sufficiently far from any of its poles along the real axis. Also since ${\text{Re }(1-s) \ge 2}$ implies the ${\frac{L'}{L}}$ term is bounded. Finally, the logarithmic derivative of ${\Gamma}$ contributes ${\log \left\lvert s \right\rvert}$ according to Stirling. So, total error is ${O(\log q) + O(1) + O(\log \left\lvert s \right\rvert) + O(1)}$ and this gives the conclusion. $\Box$

5.3. Horizontal Distribution

I claim that:

Theorem 10 (Horizontal Distribution Bound)

Let ${\chi}$ be a character, possibly trivial or imprimitive. There exists an absolute constant ${c_1}$ with the following properties:

1. If ${\chi}$ is complex, then no zeros are in the region ${\sigma \ge 1 - \frac{c_1}{\mathcal L}}$.
2. If ${\chi}$ is real, there are no zeros in the region ${\sigma \ge 1 - \frac{c_1}{\mathcal L}}$, with at most one exception; this zero must be real and simple.

Such bad zeros are called Siegel zeros, and I will denote them ${\beta_S}$. The important part about this estimate is that it does not depend on ${\chi}$ but rather on ${q}$. We need the relaxation to non-primitive characters, since we will use them in the proof of Landau’s Theorem.

Proof: First, assume ${\chi}$ is both primitive and nontrivial.

By the 3-4-1 lemma on ${\log L(\chi, s)}$ we derive that

$\displaystyle 3 \text{Re } \left[ -\frac{L'(\sigma, \chi_0)}{L(\sigma, \chi_0)} \right] + 4 \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] + \text{Re } \left[ -\frac{L'(\sigma+2it, \chi^2)}{L(\sigma+2it, \chi^2)} \right] \ge 0.$

This is cool because we already know that

$\displaystyle \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] < O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho}$

We now assume ${\sigma > 1}$.

In particular, we now have (since ${\text{Re } \rho < 1}$ for any zero ${\rho}$)

$\displaystyle \text{Re } \frac{1}{s-\rho} > 0.$

So we are free to throw out as many terms as we want.

If ${\chi^2}$ is primitive, then everything is clear. Let ${\rho = \beta + i \gamma}$ be a zero. Then

\displaystyle \begin{aligned} \text{Re } \left[ -\frac{L'(\sigma, \chi_0)}{L(\sigma, \chi_0)} \right] &\le \frac{1}{\sigma-1} + O(1) \\ \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] &\le O(\mathcal L) - \frac{1}{s-\rho} \\ \text{Re } \left[ -\frac{L'(\sigma+2it, \chi^2)}{L(\sigma+2it, \chi^2)} \right] &\le O(\mathcal L) \end{aligned}

where we have dropped all but one term for the second line, and all terms for the third line. If ${\chi^2}$ is not primitive but at least is not ${\chi_0}$, then we can replace ${\chi^2}$ with the inducing ${\tilde\chi_2}$ for a penalty of at most

\displaystyle \begin{aligned} \text{Re } \frac{L'}{L}(s, \tilde\chi) - \text{Re } \frac{L'}{L}(s, \chi^2) &< \text{Re } \sum_{p^k \mid q} \tilde\chi(p^k) \log p \cdot \text{Re } (p^k)^{-s} \\ &< \sum_{p \mid q} \log p \cdot (p^{-\sigma} + p^{-2\sigma} + \dots) \\ &< \sum_{p \mid q} \log p \cdot 1 \\ &\le \log q \end{aligned}

just like earlier: ${\Lambda}$ is usually zero, so we just look at the differing terms! The Dirichlet series really are practically the same. (Here we have also used the fact that ${\sigma > 1}$, and ${p \ge 2}$.)

Consequently, we derive using ${3-4-1}$ that

$\displaystyle \frac{3}{\sigma-1} - \frac{4}{s-\rho} + O(\mathcal L) \ge 0.$

Selecting ${s = \sigma + i \gamma}$ so that ${s - \rho = \sigma-\beta}$, we thus obtain

$\displaystyle \frac{4}{\sigma-\beta} \le \frac{3}{\sigma-1} + O(\mathcal L).$

If we select ${\sigma = 1 + \frac{\varepsilon}{\mathcal L}}$, we get

$\displaystyle \frac{4}{1 + \frac{\varepsilon}{\mathcal L} - \beta} \le O(\mathcal L)$

so

$\displaystyle \beta < 1 - \frac{c_2}{\mathcal L}$

for some constant ${c_2}$, initially only for primitive ${\chi}$.

But because the Euler product of the ${L}$-function of an imprimitive character versus its primitive inducing character differ by a finite number of zeros on the line ${\sigma=0}$ it follows that this holds for all nontrivial complex characters.

Unfortunately, if we are unlucky enough that ${\tilde\chi_2}$ is trivial, then replacing ${\chi^2}$ causes all hell to break loose. (In particular, ${\chi}$ is real in this case!) The problem comes in that our new penalty has an extra ${\frac{1}{s-1}}$, so

$\displaystyle \left\lvert \text{Re } \frac{L'}{L}(s, \chi^2) - \text{Re } \frac{\zeta'}{\zeta}(s) \right\rvert < \frac{1}{s-1} + \log q$

Applied with ${s = \sigma + 2it}$, we get the weaker

$\displaystyle \frac{3}{\sigma-1} - \frac{4}{s-\rho} + O(\mathcal L) + \frac{1}{\sigma - 1 + 2it} \ge 0.$

If ${\left\lvert t \right\rvert > \frac{\delta}{\log q}}$ for some ${\delta}$ then the ${\frac{1}{\sigma-1+2it}}$ term will be at most ${\frac{\log q}{\delta} = O(\mathcal L)}$ and we live to see another day. In other words, we have unconditionally established a zero-free region of the form

$\displaystyle \sigma > 1 - \frac{c(\delta)}{\mathcal L} \quad\text{and}\quad \left\lvert t \right\rvert > \frac{\delta}{\log q}$

for any ${\delta > 0}$.

Now let’s examine ${\left\lvert t \right\rvert < \frac{\delta}{\log q}}$. We don’t have the facilities to prove that there are no bad zeros, but let’s at least prove that the zero must be simple and real. By Hadamard at ${t=0}$, we have

$\displaystyle -\frac{L'(\sigma, \chi)}{L(\sigma, \chi)} < O(\mathcal L) - \sum_\rho \frac{1}{\sigma-\rho}$

where we no longer need the real parts since ${\chi}$ is real, and in particular the roots of ${L(s,\chi)}$ come in conjugate pairs. The left-hand side can be stupidly bounded below by

$\displaystyle -\frac{L'(\sigma, \chi)}{L(\sigma, \chi)} \ge - \sum_{n \ge 1} (-1) \cdot \log n \cdot n^{-\sigma} = \frac{\zeta'(\sigma)}{\zeta(\sigma)} > -\frac{1}{\sigma-1} - O(1).$

So

$\displaystyle -\frac{1}{\sigma-1} < O(\mathcal L) - \sum_\rho \frac{1}{\sigma-\rho}.$

In other words,

$\displaystyle \sum_\rho \text{Re } \frac{\sigma-\rho}{\left\lvert \sigma-\rho \right\rvert^2} < \frac{1}{\sigma-1} + O(\mathcal L).$

Then, let ${\sigma = 1 + \frac{2\delta}{\log q}}$, so

$\displaystyle \sum_\rho \text{Re } \frac{\sigma-\rho}{\left\lvert \sigma-\rho \right\rvert^2} < \frac{\log q}{2\delta} + O(\mathcal L).$

The rest is arithmetic; basically one finds that there can be at most one Siegel zero. In particular, since complex zeros come in conjugate pairs, that zero must be real.

It remains to handle the case that ${\chi = \chi_0}$ is the constant function giving ${1}$. For this, we observe that the ${L}$-function in question is just ${\zeta}$. Thus, we can decrease the constant ${c_2}$ to some ${c_1}$ in such a way that the result holds true for ${\zeta}$, which completes the proof. $\Box$

5.4. Vertical Distribution

We have the following lemma:

Lemma 11 (Sum of Zeros Lemma)

For all real ${t}$ and primitive characters ${\chi}$ (possibly trivial), we have

$\displaystyle \sum_\rho \frac{1}{4+(t-\gamma)^2} = O(\mathcal L).$

$\displaystyle \text{Re } -\frac{L'}{L}(s, \chi) = O(\mathcal L) - \sum_\rho \text{Re } \frac{1}{s-\rho}$

and we take ${s = 2 + it}$, noting that the left-hand side is bounded by a constant ${\frac{\zeta'}{\zeta}(2) = -0.569961}$. On the other hand, ${ \text{Re } \frac{1}{2+it-\rho} = \frac{\text{Re }(2+it-\rho)}{\left\lvert (2-\beta) + (t-\gamma)i \right\rvert^2} = \frac{2-\beta}{(2-\beta)^2+(t-\gamma)^2}}$ and

$\displaystyle \frac{1}{4+(t-\gamma)^2} \le \frac{2-\beta}{(2-\beta)^2+(t-\gamma)^2} \le \frac{2}{1+(t-\gamma)^2}$

as needed. $\Box$
From this we may deduce that

Lemma 12 (Number of Zeros Nearby ${T}$)

For all real ${t}$ and primitive characters ${\chi}$ (possibly trivial), the number of zeros ${\rho}$ with ${\gamma \in [t-1, t+1]}$ is ${O(\mathcal L)}$.

In particular, we may perturb any given ${T}$ by ${\le 2}$ so that the distance between it and the nearest zero is at least ${c_0 \mathcal L^{-1}}$, for some absolute constant ${c_0}$.

From this, using an argument principle we can actually also obtain the following: For a real number ${T > 0}$, we have ${ N(T, \chi) = \frac{T}{\pi} \log \frac{qT}{2\pi e} + O(\mathcal L)}$ is the number of zeros of ${L(s, \chi)}$ with imaginary part ${\gamma \in [-T, T]}$. However, we will not need this fact.

6. Error Term Party

Up to now, ${c}$ has been arbitrary. Assume now ${x \ge 6}$; thus we can now follow the tradition

$\displaystyle c = 1 + \frac{1}{\log x} < 2$

so ${c}$ is just to the right of the critical line. This causes ${x^c = ex}$. We assume also for convenience that ${T \ge 2}$.

6.1. Estimating the Truncation Error

Recall that

$\displaystyle \left\lvert E(y,T) \right\rvert < \begin{cases} y^c \min \left\{ 1, \frac{1}{T \left\lvert \log y \right\rvert} \right\} & y \neq 1 \\ cT^{-1} & y=1. \end{cases}$

We need to bound the right-hand side of

$\displaystyle \left\lvert E_{\text{truncate}} \right\rvert \le \sum_{n \ge 1} \left\lvert \Lambda(n) \chi(n) \cdot E(x/n, T) \right\rvert = \sum_{n \ge 1} \Lambda(n) \left\lvert E(x/n, T) \right\rvert.$

If ${\frac34 x \le n \le \frac 54x}$, the log part is small, and this is bad. We have to split into three cases: ${\frac34 x \le n < x}$, ${n = x}$, and ${x < n \le \frac 54x}$. This is necessary because in the event that ${\Lambda(x) \neq 0}$ (${x}$ is a prime power), then ${E(x/n,T) = E(1,T)}$ needs to be handled differently.

We let ${x_{\text{left}}}$ and ${x_{\text{right}}}$ be the nearest prime powers to ${x}$ other than ${x}$ itself. Thus this breaks our region to conquer into

$\displaystyle \frac 34 x \le x_{\text{left}} < x < x_{\text{right}} \le \frac 54 x.$

So we have possibly a center term (if ${x}$ is a prime power, we have a term ${n=x}$), plus the far left interval and the far right interval. Let ${d = \min\left\{ x-x_{\text{left}}, x_{\text{right}}-x \right\}}$ for convenience.

• In the easy case, if ${n = x}$ we have a contribution of ${E(1,T) \log x < \frac{c}{T}\log x}$, which is piddling (less than ${\log x}$).
• Suppose ${\frac 34x \le n \le x_{\text{left}} - 1}$. If ${n = x_{\text{left}} - a}$ for some integer ${1 \le a \le \frac 14x}$, then

$\displaystyle \log \frac xn \ge \log \frac{x_{\text{left}}}{x_{\text{left}}-a} = -\log\left( 1 - \frac{a}{x_{\text{left}}} \right) \ge \frac{a}{x_{\text{left}}}$

by using the silly inequality ${-\log(1-t) \ge t}$ for ${t < 1}$. So the contribution in total is at most

\displaystyle \begin{aligned} \sum_{1 \le a \le \frac 14 x} \Lambda(n) \cdot (x/n)^c \cdot \frac{1}{T \cdot \frac{a}{x_{\text{left}}}} &\le \frac{x_{\text{left}}}{T} \sum_{1 \le a \le \frac 14 x} \Lambda(n) \cdot \left( \frac 43 \right)^2 \frac 1a \\ &\le \frac{16}{9} \frac{x_{\text{left}}}{T} \log x \sum_{1 \le a \le \frac 14 x} \frac 1a \\ &\le \frac{16}{9} \frac{(x-1) (\log x)(\log \frac 14 x + 2)}{T} \\ &\le \frac{1.9x (\log x)^2}{T} \end{aligned}

provided ${x \ge 7391}$.

• If ${n = x_{\text{left}}}$, we have

$\displaystyle \log \frac xn = -\log\left( 1 - \frac{x-x_{\text{left}}}{x} \right) > \frac{d}{x}$

Hence in this case, we get an error at most

\displaystyle \begin{aligned} \Lambda(x_{\text{left}}) \left( \frac{x}{x_{\text{left}}} \right)^c \min \left\{ 1, \frac{x}{Td} \right\} &< \Lambda(x_{\text{left}}) \left( \frac 43 \right)^2 \min \left\{ 1, \frac{x}{Td} \right\} \\ &\le \frac{16}{9} \log x \min \left\{ 1, \frac{x}{Td} \right\}. \end{aligned}

• The cases ${n = x_{\text{right}}}$ and ${x_{\text{right}} + 1 \le n < \frac 54x}$ give the same bounds as above, in the same way.

Finally, if for ${x}$ outside the interval mentioned above, we in fact have ${\left\lvert \log x/n \right\rvert > \frac{1}{5}}$, say, and so all terms contribute at most

\displaystyle \begin{aligned} \sum_n \Lambda(n) \cdot (x/n)^c \cdot \frac{1}{T \log \left\lvert x/n \right\rvert} &\le \frac{5x^c}{T} \sum_n \Lambda(n) \cdot n^{-c} \\ &= \frac{5ex}{T} \cdot \left\lvert -\frac{\zeta'}{\zeta} (c) \right\rvert \\ &< \frac{5ex}{T} \cdot \left( \frac{1}{c-1} + 0.5773 \right) \\ &\le \frac{14x \log x}{T}. \end{aligned}

(Recall ${\zeta'/\zeta}$ had a simple pole at ${s=1}$, so near ${s=1}$ it behaves like ${\frac{1}{s-1}}$.)

The sum of everything is ${\le \frac{3.8x(\log x)^2+14x\log x}{T} + \frac{32}{9} \log x \min \left\{ 1, \frac{x}{Td} \right\}}$. Hence, the grand total across all these terms is the horrible

$\displaystyle \boxed{ E_{\text{truncate}} \le \frac{5x(\log x)^2}{T} + 3.6\log x \min \left\{ 1, \frac{x}{Td} \right\}}$

provided ${x \ge 1.2 \cdot 10^5}$.

6.2. Estimating the Contour Error

We now need to measure the error along the contour, taken from ${U \rightarrow \infty}$. Throughout assume ${U \ge 3}$. Naturally, to estimate the integral, we seek good estimates on

$\displaystyle \left\lvert \frac{L'}{L}(\sigma) \right\rvert.$

For this we appeal to the Hadamard expansion. We break into a couple cases.

• First, let’s look at the integral when ${-1 \le \sigma \le 2}$, so ${s = \sigma \pm iT}$ with ${T}$ large. We bound the horizontal integral along these regions; by symmetry let’s consider just the top

$\displaystyle \int_{-1+iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds.$

Thus we want an estimate of ${-\frac{L'}{L}}$.

Lemma 13

Let ${s}$ be such that ${-1 \le \sigma \le 2}$, ${\left\lvert t \right\rvert \ge 2}$. Assume ${\chi}$ is primitive (possibly trivial), and that ${t}$ is not within ${c_0\mathcal L^{-1}}$ of any zeros of ${L(s, \chi)}$. Then

$\displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = O(\mathcal L^2)$

Proof: Since we assumed that ${T \ge 2}$ we need not worry about ${\frac{\delta(\chi)}{s-1}}$ and so we obtain

$\displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = -\frac{1}{2} \log\frac{q}{\pi} - \frac{1}{2}\frac{\Gamma'(\frac{1}{2} s + \frac{1}{2} a)}{\Gamma(\frac{1}{2} s + \frac{1}{2} a)} + B(\chi) + \sum_{\rho} \left( \frac{1}{s-\rho} + \frac{1}{\rho} \right).$

and we eliminate ${B(\chi)}$ by computing

$\displaystyle \frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} - \frac{L'(2+it, \chi)}{L(2+it, \chi)} = E_{\text{gamma}} + \sum_{\rho} \left( \frac{1}{\sigma+it-\rho} - \frac{1}{2+it-\rho} \right).$

where

$\displaystyle E_{\text{gamma}} = \frac{1}{2}\frac{\Gamma'(\frac{1}{2} (2+it) + \frac{1}{2} a)}{\Gamma(\frac{1}{2} (2+it) + \frac{1}{2} a)} - \frac{1}{2}\frac{\Gamma'(\frac{1}{2} (\sigma+it) + \frac{1}{2} a)}{\Gamma(\frac{1}{2} (\sigma+it) + \frac{1}{2} a)} \ll \log T$

by Stirling (here we use the fact that ${-1 \le \sigma \le 2}$). For the terms where ${\gamma \notin [t-1, t+1]}$ we see that

\displaystyle \begin{aligned} \left\lvert \frac{1}{\sigma+it-\rho} - \frac{1}{2+it-\rho} \right\rvert &= \frac{2-\sigma}{\left\lvert \sigma+it-\rho \right\rvert \left\lvert 2+it-\rho \right\rvert} \\ &\le \frac{2-\sigma}{\left\lvert \gamma-t \right\rvert^2} \le \frac{3}{\left\lvert \gamma-t \right\rvert^2} \\ &\le \frac{6}{\left\lvert \gamma-t \right\rvert^2+1}. \end{aligned}

So the contribution of the sum for ${\left\lvert \gamma-t \right\rvert \ge 1}$ can be bounded by ${O(\mathcal L)}$, via the vertical sum lemma.

As for the zeros with smaller imaginary part, we at least have ${\left\lvert 2+it-\rho \right\rvert = \left\lvert 2-\beta \right\rvert > 1}$ and thus we can reduce the sum to just

$\displaystyle \frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} - \frac{L'(2+it, \chi)}{L(2+it, \chi)} = \sum_{\gamma\in[t-1,t+1]} \frac{1}{\sigma+it-\rho} + O(\mathcal L).$

Now by the assumption that ${\left\lvert \gamma-t \right\rvert \ge c\mathcal L^{-1}}$; so the terms of the sum are all at most ${O(\mathcal L)}$. Also, there are ${O(\mathcal L)}$ zeros with imaginary part in that range. Finally, we recall that ${\frac{L'(2+it, \chi)}{L(2+it, \chi)}}$ is bounded; we can write it using its (convergent) Dirichlet series and then note it is at most ${\frac{\zeta'(2+it)}{\zeta(2+it)} \le \frac{\zeta'(2)}{\zeta(2)}}$. $\Box$
At this point, we perturb ${T}$ as described in vertical distribution so that the lemma applies, and use can then compute

\displaystyle \begin{aligned} \left\lvert \int_{-1+iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \right\rvert &< O(\mathcal L^2) \cdot \int_{-1+iT}^{c+iT} \left\lvert \frac{x^s}{s} \right\rvert \; ds \\ &< O(\mathcal L^2) \int_{-1}^c \frac{x^\sigma}{2T} \; d\sigma \\ &< O(\mathcal L^2) \cdot \frac{x^{c+1}-1}{T \log x} \\ &< O\left(\frac{\mathcal L^2 x}{T \log x}\right). \end{aligned}

• Next, for the integral ${-U \le \sigma \le 1}$, we use the “far-left” estimate to obtain

\displaystyle \begin{aligned} \left\lvert \int_{-U+iT}^{-1+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \right\rvert &\ll \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{s} \right\rvert \cdot \log q \left\lvert s \right\rvert \; ds \\ &\ll \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{s} \right\rvert \cdot \log q \left\lvert s \right\rvert \; ds \\ &\ll \log q \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{s} \right\rvert \; ds + \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s \log \left\lvert s \right\rvert}{s} \right\rvert \; ds \\ &< \log q \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{T} \right\rvert \; ds + \int_{-\infty}^{-1} \left\lvert \frac{x^s \log T}{T} \right\rvert \; ds \\ &\ll \frac{\log q}{T} \int_{-\infty}^{-1} x^\sigma \; d\sigma + \frac{\log T}{T} \int_{-\infty}^{-1} x^\sigma \; d\sigma \\ &< \frac{\mathcal L}{T} \left( \frac{x^{-1}}{\log x} \right) = \frac{\mathcal L}{T x \log x}. \end{aligned}

So the contribution in this case is ${O\left( \frac{\mathcal L}{T x \log x} \right)}$.

• Along the horizontal integral, we can use the same bound

\displaystyle \begin{aligned} \left\lvert \int_{-U-iT}^{-U+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \right\rvert &\ll \int_{-U-iT}^{-U+iT} \left\lvert \frac{x^s}{s} \right\rvert \cdot \log q \left\lvert s \right\rvert \; ds \\ &= x^{-U} \cdot \int_{-U-iT}^{-U+iT} \frac{\log q \left\lvert s \right\rvert}{\left\lvert s \right\rvert} \; ds \\ &= x^{-U} \cdot \int_{-U-iT}^{-U+iT} \frac{\log q + \log U}{U} \; ds \\ &= \frac{2T(\log q + \log U)}{Ux^U} \end{aligned}

which vanishes as ${U \rightarrow \infty}$.

So we only have two error terms, ${O\left( \frac{\mathcal L^2 x}{T \log x} \right)}$ and ${O\left( \frac{\mathcal L}{Tx\log x} \right)}$. The first is clearly larger, so we end with

$\displaystyle \boxed{E_{\text{contour}} \ll \frac{\mathcal L^2x}{T \log x}}.$

6.3. The term ${b(\chi)}$

We can estimate ${b(\chi)}$ as follows:

Lemma 14

For primitive ${\chi}$. we have

$\displaystyle b(\chi) = O(\log q) - \sum_{\left\lvert \gamma \right\rvert < 1} \frac{1}{\rho}$

Proof: The idea is to look at ${\frac{L'}{L}(s,\chi)-\frac{L'}{L}(2,\chi)}$. By subtraction, we obtain

\displaystyle \begin{aligned} \frac{L'}{L}(s, \chi) -\frac{L'}{L}(2, \chi) &= - \frac{\Gamma'}{\Gamma} \left( \frac{s+a}{2} \right) + \frac{\Gamma'}{\Gamma} \left( \frac{2+a}{2} \right) \\ &- \frac rs - \frac r{s-1} + \frac r2 + \frac r1 \\ &+ \sum_\rho \left( \frac{1}{s-\rho} - \frac{1}{2-\rho} \right) \end{aligned}

Then at ${s=0}$ (eliminating the poles), we have

$\displaystyle \frac{L'}{L}(s, \chi) = O(1) - \sum_{\rho} \left( \frac{1}{\rho}+\frac{1}{2-\rho} \right)$

where the ${O(1)}$ is ${\frac{L'}{L}(2,\chi) + \frac r2 + \gamma + \frac{\Gamma'}{\Gamma}(1)}$ if ${a=0}$ and ${\frac{L'}{L}(2,\chi) + \frac r2 - \frac{\Gamma'}{\Gamma}(\frac{1}{2}) + \frac{\Gamma'}{\Gamma}(\frac32)}$ for ${a=1}$. Furthermore,

$\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert > 1} \left( \frac{1}{\rho}+\frac{1}{2-\rho} \right) \le \sum_{\rho, \left\lvert \gamma \right\rvert > 1} \frac{2}{\left\lvert \rho(2-\rho) \right\rvert} < 2 \sum_{\rho, \left\lvert \gamma \right\rvert > 1} \frac{1}{\left\lvert 2-\rho \right\rvert^2}$

which is ${O(\log q)}$ by our vertical distribution results, and similarly

$\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert < 1} \frac{1}{2-\rho} = O(\log q).$

This completes the proof. $\Box$

Let ${\beta_1}$ be a Siegel zero, if any; for all the other zeros, we have that ${\left\lvert \frac{1}{\rho} \right\rvert = \frac{1}{\beta^2+\gamma^2}}$. We now have two cases.

• ${\overline{\chi} \neq \chi}$. Then ${\overline{\chi}}$ is complex and thus has no exceptional zeros; hence each of its zeros has ${\beta < 1 - \frac{c}{\log q}}$; since ${\overline{\rho}}$ is a zero of ${\overline{\chi}}$ if and only if ${1-\rho}$ is a zero of ${\chi}$, it follows that all zeros of ${\chi}$ are have ${\left\lvert \frac{1}{\rho} \right\rvert < O(\log q)}$. Moreover, in the range ${\gamma \in [-1,1]}$ there are ${O(\log q)}$ zeros (putting ${T=0}$ in our earlier lemma on vertical distribution).

Thus, total contribution of the sum is ${O\left( (\log q)^2 \right)}$.

• If ${\overline{\chi} = \chi}$, then ${\chi}$ is real. The above argument goes through, except that we may have an extra Siegel zero at ${\beta_S}$; hence there will also be a special zero at ${1 - \beta_S}$. We pull these terms out separately.Consequently,

$\displaystyle \boxed{b(\chi) = O\left( (\log q)^2 \right) - \frac{1}{\beta_S} - \frac{1}{1-\beta_S}}.$

By adjusting the constant, we may assume ${\beta_S > \frac{2014}{2015}}$ if it exists.

7. Computing ${\psi(x,\chi)}$ and ${\psi(x;q,a)}$

7.1. Summing the Error Terms

We now have, for any ${T \ge 2}$, ${x \ge 6}$, and ${\chi}$ modulo ${q}$ possibly primitive or trivial, the equality

$\displaystyle \psi(x, \chi) = \delta(\chi) x + E_{\text{contour}} + E_{\text{truncate}} + E_{\text{tiny}} - b(\chi) - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}.$

where

\displaystyle \begin{aligned} E_{\text{contour}} &\ll \frac{x(\log x)^2}{T} + \log x \min \left\{ 1, \frac{x}{Td} \right\} \\ E_{\text{truncate}} &\ll \frac{\mathcal L^2 x}{T \log x} \\ E_{\text{tiny}} &\ll \log x \log q \\ b(\chi) &= O\left( (\log q)^2 \right) - \frac{1}{\beta_S} - \frac{1}{1-\beta_S}. \end{aligned}

Assume now that ${T \le x}$, and ${x}$ is an integer (hence ${d \ge 1}$). Then aggregating all the errors gives

$\displaystyle \psi(x, \chi) = \delta(\chi) x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} - \frac{x^{\beta_S}-1}{\beta_S} - \frac{x^{1-\beta_S}-1}{1-\beta_S} + O\left( \frac{x (\log qx)^2}{T} \right).$

where the sum over ${\rho}$ now excludes the Siegel zero. We can omit the terms ${\beta_S^{-1} < \frac{2015}{2014} = O(1)}$, and also

$\displaystyle \frac{x^{1-\beta_S}-1}{1-\beta_S} < x^{\frac{1}{2015}} \log x.$

Absorbing things into the error term,

$\displaystyle \psi(x, \chi) = \delta(\chi) x - \frac{x^{\beta_S}}{\beta_S} - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} + O\left( \frac{x (\log qx)^2}{T} + x^{\frac{1}{2015}} \log x \right).$

7.2. Estimating the Sum Over Zeros

Now we want to estimate

$\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert < T} \left\lvert \frac{x^\rho}{\rho} \right\rvert.$

We do this is the dumbest way possible: putting a bound on ${x^\rho}$ and pulling it out.

For any non-Siegel zero, we have a zero-free region ${\beta < 1 - \frac{c_1}{\mathcal L}}$, whence

$\displaystyle \left\lvert x^\rho \right\rvert < x^{\beta} = x \cdot x^{\beta-1} = x \exp\left( \frac{-c_1 \log x}{\mathcal L} \right).$

Pulling this out, we can then estimate the reciprocals by using our differential:

\displaystyle \begin{aligned} \sum_{\rho, \left\lvert \gamma \right\rvert < T} \left\lvert \frac{1}{\rho} \right\rvert < \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{1}{\gamma} < \sum_{t=1}^T \frac{\log q(t+2)}{t} \ll (\log qT)^2 \le (\log qx)^2. \end{aligned}

Hence,

$\displaystyle \psi(x, \chi) = \delta(\chi) x - \frac{x^{\beta_S}}{\beta_S} + O\left( \frac{x (\log qx)^2}{T} + x^{\frac{1}{2015}} \log x + (\log qx)^2 \cdot x \exp\left( \frac{-c_1 \log x}{\mathcal L} \right) \right).$

We select

$\displaystyle T = \exp\left(c_3 \sqrt{\log x}\right)$

for some constant ${c_3}$, and moreover assume ${q \le T}$, then we obtain

$\displaystyle \psi(x, \chi) = \delta(\chi) x - \frac{x^{\beta_S}}{\beta_S} + O\left( x \exp\left( -c_4 \sqrt{\log x} \right) \right).$

7.3. Summing Up

We would like to sum over all characters ${\chi}$. However, we’re worried that there might be lots of Siegel zeros across characters. A result of Landau tells us this is not the case:

Theorem 15 (Landau)

If ${\chi_1}$ and ${\chi_2}$ are real nontrivial primitive characters modulo ${q_1}$ and ${q_2}$, then for any zeros ${\beta_1}$ and ${\beta_2}$ we have

$\displaystyle \min \left\{ \beta_1, \beta_2 \right\} < 1 - \frac{c_5}{\log q_1q_2}$

for some fixed absolute ${c_5}$. In particular, for any fixed ${q}$, there is at most one ${\chi \mod q}$ with a Siegel zero.

Proof: The character ${\chi_1\chi_2}$ is not trivial, so we can put

\displaystyle \begin{aligned} -\frac{\zeta'}{\zeta}(\sigma) &= \frac{1}{\sigma-1} + O(1) \\ -\frac{L'}{L}(\sigma, \chi_1\chi_2) &= O(\log q_1q_2) \\ -\frac{L'}{L}(\sigma, \chi_1) &= O(\log q_1) - \frac{1}{\sigma-\beta_i} \\ -\frac{L'}{L}(\sigma, \chi_2) &= O(\log q_2) - \frac{2}{\sigma-\beta_i}. \end{aligned}

Now we use a silly trick:

$\displaystyle 0 \le -\frac{\zeta'}{\zeta}(\sigma) -\frac{L'}{L}(\sigma, \chi_1) -\frac{L'}{L}(\sigma, \chi_2) -\frac{L'}{L}(\sigma, \chi_1\chi_2)$

by “Simon’s Favorite Factoring Trick” (we use the deep fact that ${(1+\chi_1)(1+\chi_2) \ge 0}$, the analog of ${3-4-1}$). The upper bounds give now

$\displaystyle \frac{1}{\sigma-\beta_1} + \frac{1}{\sigma-\beta_2} < \frac{1}{\sigma-1} + O(\log q_1 \log q_2).$

and one may deduce the conclusion from here. $\Box$

We now sum over all characters ${\chi}$ as before to obtain

$\displaystyle \psi(x; q, a) = \frac{1}{\phi(q)} x - \frac{\chi_S(a)}{\phi(q)} \frac{x^{\beta_S}}{\beta_S} + O \left( x \exp\left(-c_6 \sqrt{\log x}\right) \right)$

where ${\chi_S = \overline{\chi}_S}$ is the character with a Siegel zero, if it exists.

8. Siegel’s Theorem, and Finishing Off

The term with ${x^{\beta_S} / \beta_S}$ is bad, and we need some way to get rid of it. We now appeal to Siegel’s Theorem:

Theorem 16 (Siegel’s Theorem)

For any ${\varepsilon > 0}$ there is a ${C_1(\varepsilon) > 0}$ such that any Siegel zero ${\beta_S}$ satisfies

$\displaystyle \beta_S < 1-C_1(\varepsilon) q^{-\varepsilon}.$

Thus for a positive constant ${N}$, assuming ${q \le (\log x)^N}$, letting ${\varepsilon = (2N)^{-1}}$ means ${q^{-\varepsilon} > \frac{1}{\sqrt{\log x}}}$, so we obtain

$\displaystyle x^{\beta_S} < x \exp\left( -C_1(\varepsilon) \log x q^{-\varepsilon} \right) < x \exp\left( -C_1(\varepsilon) \sqrt{\log x} \right).$

Then

\displaystyle \begin{aligned} \psi(x; q, a) &= \frac{1}{\phi(q)} x - \frac{\chi_S(a)}{\phi(q)} \frac{x^{\beta_S}}{\beta_S} + O \left( x \exp\left(-c_6 \sqrt{\log x}\right) \right) \\ &\le \frac{1}{\phi(q)} x + O\left( x \exp\left( -C_1(\varepsilon) \sqrt{\log x} \right) \right) + O \left( x \exp\left(-c_6 \sqrt{\log x}\right) \right) \\ &\le \frac{1}{\phi(q)} x + O\left( x \exp\left( -C(N) \sqrt{\log x} \right) \right) \end{aligned}

where ${C(N) = \min \left\{ C_1(\varepsilon), c_6 \right\}}$. This completes the proof of Dirichlet’s Theorem.

Zeros and Primes

Prerequisites for this post: previous post, and complex analysis. For this entire post, ${s}$ is a complex variable with ${s = \sigma + it}$.

1. The ${\Gamma}$ function

So there’s this thing called the Gamma function. Denoted ${\Gamma(s)}$, it is defined by

$\displaystyle \Gamma(s) = \int_0^{\infty} x^{s-1} e^{-x} \; dx$

as long as ${\sigma > 0}$. Here are its values at the first few integers:

\displaystyle \begin{aligned} \Gamma(1) &= 1 \\ \Gamma(2) &= 1 \\ \Gamma(3) &= 2 \\ \Gamma(4) &= 6 \\ \Gamma(5) &= 24. \end{aligned}

Yep: the ${\Gamma}$ function is contrived so that the identity

$\displaystyle \Gamma(s+1) = s\Gamma(s)$

always holds. (Proof: integration by parts.) Thus ${\Gamma}$ interpolates the factorial function a nice way. Moreover, this identity lets us extend ${\Gamma}$ to a meromorphic function on the entire complex plane.

We like the ${\Gamma}$ function because, unlike ${\zeta}$, we know what it’s doing. For example we actually know how to compute ${\Gamma(n)}$ for any positive integer ${n}$. (Contrast this with, say, ${\zeta(3)}$; we don’t even know if it’s an algebraic number, and it took until 1978 to prove that it was irrational). More to the point, we also know where all its zeros and poles are:

Proposition 1

Let ${\Gamma}$ be defined as above, extended to a meromorphic function on all of ${\mathbb C}$.

• The function ${\Gamma}$ has no zeros.
• It has a simple pole at ${s= 0, -1, -2, \dots}$, and these are the only poles.

The pole at ${s=0}$ should not be surprising: plug in ${s=0}$ to ${\Gamma(s+1)=s\Gamma(s)}$.

In any case, moral of story: ${\Gamma}$ is very friendly!

2. Functional Equation for Zeta

We will now do something really magical, due to Riemann. Pick an integer ${n}$; we set ${t = \pi n^2x}$ to derive the artificial equation

$\displaystyle \Gamma(s) = (n^2\pi)^{s} \int_0^{\infty} x^{s-1} e^{-n^2\pi x} \; dx.$

Replacing ${s}$ with ${\frac{1}{2} s}$ and rearranging gives

$\displaystyle \pi^{-s/2} \cdot \Gamma(s/2) n^{-s} = \int_0^\infty x^{s/2 - 1} e^{-n^2\pi x} \; dx \qquad \sigma > 0.$

Then, due to absolute convergence we can sum over ${n \ge 1}$; this brings the Riemann zeta function into the right-hand side, to get

$\displaystyle \pi^{-s/2} \cdot \Gamma(s/2) \zeta(s) = \int_0^\infty x^{s/2 - 1} \frac{\theta(x)-1}{2} \; dx \qquad \sigma > 1$

where

$\displaystyle \theta(x) = \sum_{n=-\infty}^{\infty} e^{-n^2\pi x};$

so that ${\frac{\theta(x)-1}{2}}$ gives just the sum for ${n \ge 1}$. It turns out that this ${\theta}$ is special: a Jacobi theta function, which happens to satisfy

$\displaystyle \theta(x^{-1}) = \sqrt{x} \cdot \theta(x) \quad \forall x > 0.$

Also, ${\omega(x) = O(e^{-\pi x})}$ for ${x > 0}$.

Using this and with some purely algebraic manipulations (namely, splitting the integral into two parts, ${0 \le x \le 1}$ and ${x \ge 1}$, and then using the property of the theta function), one can derive that

$\displaystyle \pi^{-s/2}\Gamma(s/2)\zeta(s) = \frac1{s-1} - \frac 1s + \int_1^\infty \frac{\theta(x)-1}{2} \cdot \left( x^{s/2-1} + x^{(1-s)/2-1} \right) \; dx.$

The right-hand side has two very useful properties:

• It is even around ${\frac{1}{2}}$, meaning it remains the same when ${s}$ is replaced by ${1-s}$.
• The integral is actually an entire function on all of ${\mathbb C}$ (the integral converges for all ${s}$ because ${\theta(x) = O(e^{-\pi x})}$).

So if we multiply both sides by ${\frac{1}{2} s(s-1)}$, we get a symmetric entire function, called the ${\xi}$ function:

Theorem 2

The function

$\displaystyle \xi(s) = \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s)$

satisfies

$\displaystyle \xi(s) = \xi(1-s)$

and is an entire function on all of ${\mathbb C}$.

In particular we can use this to extend ${\zeta}$ to a meromorphic function on the entire complex plane. We have

$\displaystyle \pi^{-\frac{1}{2} s} \Gamma\left( \frac s2 \right) \zeta(s) = \pi^{-\frac{1}{2} (1-s)} \Gamma\left( \frac{1-s}{2} \right) \zeta(1-s).$

We can count zeros and poles from this. The ${\pi}$‘s don’t do anything. The ${\Gamma}$‘s have no zeros and just poles at non-positive integers.

Since ${\zeta}$ has no zeros and no poles other than ${s = 1}$ for ${\sigma > 0}$, the only zeros it will have are for ${\sigma \le 0}$ are at ${s = -2, -4, -6, \dots}$ in order to cancel out the poles of ${\Gamma(s/2)}$. And so these are the so-called trivial zeros.

On the other hand in the strip ${0 < \sigma < 1}$ we get very confused. More on that later.

3. Explicit Formula for Chebyshev Function

Recall that last post we had

$\displaystyle \psi(x) = \frac{1}{2\pi i} \int_{c-i\infty}^{c+i\infty} -\frac{\zeta'(s)}{\zeta(s)} \cdot \frac{x^s}{s} \; ds$

according to Perron’s Formula. Write this as

$\displaystyle \psi(x) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \left( \int_{c-iT}^{c+iT} \frac{\zeta'(s)}{\zeta(s)} \cdot -\frac{x^s}{s} \; ds \right)$

Recall that if ${f}$ is a meromorphic function then so is

$\displaystyle \frac{1}{2\pi i} \cdot \frac{f'}{f}$

which has a simple pole for each zero/pole of ${f}$; the residue is ${+1}$ for each simple zero (more generally ${+m}$ for a zero of multiplicity ${m}$) and ${-1}$ for each simple pole (more generally ${-m}$ for a pole of order ${m}$). (See the argument principle).

Now we are going to do use the bestest theorem from complex analysis: Cauchy’s Residue Theorem. We replace the path ${c-iT \rightarrow c+iT}$ with ${c-iT \rightarrow -U-iT \rightarrow U + iT \rightarrow c+iT}$. Doing so picks up the following residues:

• Since ${\zeta}$ has a pole at ${s=1}$, ${\zeta'/\zeta}$ has a simple pole there of residue ${-1}$, so we pick up a residue of ${+x}$ corresponding to

$\displaystyle (-1) \cdot -x^1/1 = +x.$

This is the “main term”.

• ${\zeta}$ has simple zeros at ${s=-2,-4,-6,\dots}$. So we pick up residues of ${-\frac{x^2}{2}}$, ${-\frac{x^4}{4}}$, and so on, or

$\displaystyle - \sum_{n \ge 1} \frac{x^{2n}}{2n} = -\frac{1}{2} \log(1-x^{-2}).$

• ${x^s/s}$ itself has a pole at ${s=0}$, which gives us an additional term

$\displaystyle -\frac{\zeta'(0)}{\zeta(0)}.$

It turns out that this equals ${\log(2\pi)}$, because why not.

• Finally, the hard-to-understand zeros in the strip ${0 < \sigma < 1}$. If ${\rho = \beta+i\gamma}$ is a zero, then it contributes a residue of ${-\frac{x^\rho}{\rho}}$. We only pick up the zeros with ${\left\lvert \gamma \right\rvert < T}$ in our rectangle, so we get a term

$\displaystyle \sum_{\substack{\rho, \left\lvert \gamma \right\rvert < T}} \frac{x^\rho}{\rho}$

In what follows, ${\rho}$ always denotes a zero in the critical strip, written ${\rho = \beta + i \gamma}$.

Putting this all together, we obtain that

$\displaystyle \psi(x) = \frac{1}{2\pi i} \lim_{U \rightarrow \infty} \lim_{T \rightarrow \infty} \left( x - \frac{1}{2} \log(1-x^{-2}) - \log(2\pi) - \sum_{\substack{\rho, \left\lvert \gamma \right\rvert < T}} \frac{x^\rho}{\rho} + I(x,U,T) \right)$

where ${I(x,U,T)}$ is the integral along the three sides of the rectangle. With some effort, you can show that in fact ${I(x,U,T) \rightarrow 0}$; but the speed of convergence depends on ${x}$. To avoid this, we can take ${U \rightarrow \infty}$ but leave ${T}$ intact, which gives the explicit formula:

Theorem 3 (Explicit Formula for the Chebyshev Function)

For ${x \ge 2}$

$\displaystyle \psi(x) = x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} - \log(2\pi) - \frac{1}{2}\log(1-x^{-2}) + R(x,T)$

where ${R(x,T)}$ is some error term, with the property that ${R(x,T) \rightarrow 0}$ as ${T \rightarrow \infty}$ for any fixed ${x}$.

(It’s necessary to use a truncated sum since the series ${\sum_{\rho} \frac{1}{\left\lvert \rho \right\rvert}}$ actually converges only conditionally, and not absolutely.)

The error term is ugly enough that I haven’t included it in the formula, but if you want to know I’ll at least write it down: it is

$\displaystyle R(x,T) = O\left( \frac{x \log(xT)^2}{T} + \log x \min \left\{ 1, \frac{x}{T\left} \right\} \right)$

where ${\left}$ is the distance from ${x}$ to the nearest prime power (${\ne x}$). Clearly, for any fixed ${x}$, ${R(x,T) \rightarrow 0}$ as ${T \rightarrow \infty}$. What I will say is that it’s not our main concern:

4. The Prime Number Theorem

Here’s the deal: we know ${\psi(x) \approx x}$, and want something as close as possible. We get the ${x}$ right off the bat, so we want everything else to be small.

The term ${-\frac{1}{2}\log(1-x^{-2})}$ is tiny, as is the constant. The ${R(x,T)}$ can be handled as long as ${T}$ is big enough: the price we pay is that we introduce more zeros into the sum over ${\rho}$. The sum of the zeros: well, what about the sum?

We know that for any zero ${\rho = \beta + i \gamma}$, we have ${\beta < 1}$. But if ${\beta = 1 - \varepsilon}$, we’d be very upset, because now our sum has a term of size ${\left\lvert x^\rho \right\rvert = x^\beta = x^{1-\varepsilon}}$ in it, which is bad since the thing we’re shooting for is ${\psi(x) \approx x}$. So what we’d like is to be able to force ${\beta}$ to be less than something, like ${0.9}$. Then we’d have an error term around ${x^{0.9}}$, which is not spectacular but better than ${x^{1-\varepsilon}}$.

In fact, we believe that ${\beta = \frac12}$, always — the Riemann Hypothesis. With some messing around with the value of ${T}$, we would then get an error term of

$\displaystyle O\left( x^{\frac12 + \varepsilon} \right)$

which is pretty good, and actually near best possible (one can show that ${O(\sqrt x)}$ is not achievable).

Unfortunately, we can not even show ${\beta < 0.999}$. The most we know is that ${\beta < 1 - \frac{c}{\log \gamma}}$, which gives some pretty crummy bounds compared to what we think is true: using this bound, the best we can do is

$\displaystyle \psi(x) = x + O\left( x \exp\left( -c\sqrt{\log x} \right) \right)$

which is worse than ${O(x^{0.999})}$. That’s the current state of affairs.

That ${\zeta}$ function is pretty mysterious.

\bigskip

References: Kedlaya’s 18.785 notes and Hildebrand’s ANT notes.