Holomorphic Logarithms and Roots

In this post we’ll make sense of a holomorphic square root and logarithm. Wrote this up because I was surprised how hard it was to find a decent complete explanation.

Let {f : U \rightarrow \mathbb C} be a holomorphic function. A holomorphic {n}th root of {f} is a function {g : U \rightarrow \mathbb C} such that {f(z) = g(z)^n} for all {z \in U}. A logarithm of {f} is a function {g : U \rightarrow \mathbb C} such that {f(z) = e^{g(z)}} for all {z \in U}. The main question we’ll try to figure out is: when do these exist? In particular, what if {f = \mathrm{id}}?

1. Motivation: Square Root of a Complex Number

To start us off, can we define {\sqrt z} for any complex number {z}?

The first obvious problem that comes up is that there for any {z}, there are two numbers {w} such that {w^2 = z}. How can we pick one to use? For our ordinary square root function, we had a notion of “positive”, and so we simply took the positive root.

Let’s expand on this: given { z = r \left( \cos\theta + i \sin\theta \right) } (here {r \ge 0}) we should take the root to be

\displaystyle w = \sqrt{r} \left( \cos \alpha + i \sin \alpha \right).

such that {2\alpha \equiv \theta \pmod{2\pi}}; there are two choices for {\alpha \pmod{2\pi}}, differing by {\pi}.

For complex numbers, we don’t have an obvious way to pick {\alpha}. Nonetheless, perhaps we can also get away with an arbitrary distinction: let’s see what happens if we just choose the {\alpha} with {-\frac{1}{2}\pi < \alpha \le \frac{1}{2}\pi}.

Pictured below are some points (in red) and their images (in blue) under this “upper-half” square root. The condition on {\alpha} means we are forcing the blue points to lie on the right-half plane.


Here, {w_i^2 = z_i} for each {i}, and we are constraining the {w_i} to lie in the right half of the complex plane. We see there is an obvious issue: there is a big discontinuity near the point {z_5} and {z_7}! The nearby point {w_6} has been mapped very far away. This discontinuity occurs since the points on the negative real axis are at the “boundary”. For example, given {-4}, we send it to {-2i}, but we have hit the boundary: in our interval {-\frac{1}{2}\pi \le \alpha < \frac{1}{2}\pi}, we are at the very left edge.

The negative real axis that we must not touch is is what we will later call a branch cut, but for now I call it a ray of death. It is a warning to the red points: if you cross this line, you will die! However, if we move the red circle just a little upwards (so that it misses the negative real axis) this issue is avoided entirely, and we get what seems to be a “nice” square root.


In fact, the ray of death is fairly arbitrary: it is the set of “boundary issues” that arose when we picked {-\frac{1}{2}\pi < \alpha \le \frac{1}{2}\pi}. Suppose we instead insisted on the interval {0 \le \alpha < \pi}; then the ray of death would be the positive real axis instead. The earlier circle we had now works just fine.


What we see is that picking a particular {\alpha}-interval leads to a different set of edge cases, and hence a different ray of death. The only thing these rays have in common is their starting point of zero. In other words, given a red circle and a restriction of {\alpha}, I can make a nice “square rooted” blue circle as long as the ray of death misses it.

So, what exactly is going on?

2. Square Roots of Holomorphic Functions

To get a picture of what’s happening, we would like to consider a more general problem: let {f: U \rightarrow \mathbb C} be holomorphic. Then we want to decide whether there is a {g : U \rightarrow \mathbb C} such that

\displaystyle f(z) = g(z)^2.

Our previous discussion when {f = \mathrm{id}} tells us we cannot hope to achieve this for {U = \mathbb C}; there is a “half-ray” which causes problems. However, there are certainly functions {f : \mathbb C \rightarrow \mathbb C} such that a {g} exists. As a simplest example, {f(z) = z^2} should definitely have a square root!

Now let’s see if we can fudge together a square root. Earlier, what we did was try to specify a rule to force one of the two choices at each point. This is unnecessarily strict. Perhaps we can do something like the following: start at a point in {z_0 \in U}, pick a square root {w_0} of {f(z_0)}, and then try to “fudge” from there the square roots of the other points. What do I mean by fudge? Well, suppose {z_1} is a point very close to {z_0}, and we want to pick a square root {w_1} of {f(z_1)}. While there are two choices, we also would expect {w_0} to be close to {w_1}. Unless we are highly unlucky, this should tells us which choice of {w_1} to pick. (Stupid concrete example: if I have taken the square root {-4.12i} of {-17} and then ask you to continue this square root to {-16}, which sign should you pick for {\pm 4i}?)

There are two possible ways we could get unlucky in the scheme above: first, if {w_0 = 0}, then we’re sunk. But even if we avoid that, we have to worry that we are in a situation, where we run around a full loop in the complex plane, and then find that our continuous perturbation has left us in a different place than we started. For concreteness, consider the following situation, again with {f = \mathrm{id}}:


We started at the point {z_0}, with one of its square roots as {w_0}. We then wound a full red circle around the origin, only to find that at the end of it, the blue arc is at a different place where it started!

The interval construction from earlier doesn’t work either: no matter how we pick the interval for {\alpha}, any ray of death must hit our red circle. The problem somehow lies with the fact that we have enclosed the very special point {0}.

Nevertheless, we know that if we take {f(z) = z^2}, then we don’t run into any problems with our “make it up as you go” procedure. So, what exactly is going on?

3. Covering Projections

By now, if you have read the part of algebraic topology. this should all seem very strangely familiar. The “fudging” procedure exactly describes the idea of a lifting.

More precisely, recall that there is a covering projection

\displaystyle (-)^2 : \mathbb C \setminus \{0\} \rightarrow \mathbb C \setminus \{0\}.

Let {V = \left\{ z \in U \mid f(z) \neq 0 \right\}}. For {z \in U \setminus V}, we already have the square root {g(z) = \sqrt{f(z)} = \sqrt 0 = 0}. So the burden is completing {g : V \rightarrow \mathbb C}.

Then essentially, what we are trying to do is construct a lifting {g} for the following diagram: cproj-squareOur map {p} can be described as “winding around twice”. From algebraic topology, we now know that this lifting exists if and only if

\displaystyle f_\ast``(\pi_1(V)) \subseteq p_\ast``(\pi_1(E))

is a subset of the image of {\pi_1(E)} by {p}. Since {B} and {E} are both punctured planes, we can identify them with {S^1}.

Ques 1

Show that the image under {p} is exactly {2\mathbb Z} once we identify {\pi_1(B) = \mathbb Z}.

That means that for any loop {\gamma} in {V}, we need {f \circ \gamma} to have an even winding number around {0 \in B}. This amounts to

\displaystyle \frac{1}{2\pi} \oint_\gamma \frac{f'}{f} \; dz \in 2\mathbb Z

since {f} has no poles.

Replacing {2} with {n} and carrying over the discussion gives the first main result.

Theorem 2 (Existence of Holomorphic {n}th Roots)

Let {f : U \rightarrow \mathbb C} be holomorphic. Then {f} has a holomorphic {n}th root if and only if

\displaystyle \frac{1}{2\pi i}\oint_\gamma \frac{f'}{f} \; dz \in n\mathbb Z

for every contour {\gamma} in {U}.

4. Complex Logarithms

The multivalued nature of the complex logarithm comes from the fact that

\displaystyle \exp(z+2\pi i) = \exp(z).

So if {e^w = z}, then any complex number {w + 2\pi i k} is also a solution.

We can handle this in the same way as before: it amounts to a lifting of the following diagram. cproj-expThere is no longer a need to work with a separate {V} since:

Ques 3

Show that if {f} has any zeros then {g} possibly can’t exist.

In fact, the map {\exp : \mathbb C \rightarrow \mathbb C\setminus\{0\}} is a universal cover, since {\mathbb C} is simply connected. Thus, {p``(\pi_1(\mathbb C))} is trivial. So in addition to being zero-free, {f} cannot have any winding number around {0 \in B} at all. In other words:

Theorem 4 (Existence of Logarithms)

Let {f : U \rightarrow \mathbb C} be holomorphic. Then {f} has a logarithm if and only if

\displaystyle \frac{1}{2\pi i}\oint_\gamma \frac{f'}{f} \; dz = 0

for every contour {\gamma} in {U}.

5. Some Special Cases

The most common special case is

Corollary 5 (Nonvanishing Functions from Simply Connected Domains)

Let {f : \Omega \rightarrow \mathbb C} be continuous, where {\Omega} is simply connected. If {f(z) \neq 0} for every {z \in \Omega}, then {f} has both a logarithm and holomorphic {n}th root.

Finally, let’s return to the question of {f = \mathrm{id}} from the very beginning. What’s the best domain {U} such that we can define {\sqrt{-} : U \rightarrow \mathbb C}? Clearly {U = \mathbb C} cannot be made to work, but we can do almost as well. For note that the only zero of {f = \mathrm{id}} is at the origin. Thus if we want to make a logarithm exist, all we have to do is make an incision in the complex plane that renders it impossible to make a loop around the origin. The usual choice is to delete negative half of the real axis, our very first ray of death; we call this a branch cut, with branch point at {0 \in \mathbb C} (the point which we cannot circle around). This gives

Theorem 6 (Branch Cut Functions)

There exist holomorphic functions

\displaystyle \begin{aligned} \log &: \mathbb C \setminus (-\infty, 0] \rightarrow \mathbb C \\ \sqrt[n]{-} &: \mathbb C \setminus (-\infty, 0] \rightarrow \mathbb C \end{aligned}

satisfying the obvious properties.

There are many possible choices of such functions ({n} choices for the {n}th root and infinitely many for {\log}); a choice of such a function is called a branch. So this is what is meant by a “branch” of a logarithm.

The principal branch is the “canonical” branch, analogous to the way we arbitrarily pick the positive branch to define {\sqrt{-} : \mathbb R_{\ge 0} \rightarrow \mathbb R_{\ge 0}}. For {\log}, we take the {w} such that {e^w = z} and the imaginary part of {w} lies in {(-\pi, \pi]} (since we can shift by integer multiples of {2\pi i}). Often, authors will write {\text{Log } z} to emphasize this choice.

Example 7

Let {U} be the complex plane minus the real interval {[0,1]}. Then the function {U \rightarrow \mathbb C} by {z \mapsto z(z-1)} has a holomorphic square root.

Corollary 8

A holomorphic function {f : U \rightarrow \mathbb C} has a holomorphic {n}th root for all {n \ge 1} if and only if it has a holomorphic logarithm.

Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:


[IMO Shortlist 2015 Problem C6] Let {S} be a nonempty set of positive integers. We say that a positive integer {n} is clean if it has a unique representation as a sum of an odd number of distinct elements from {S}. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of {S}. Then, the problem reduces to the following:


Show that if {s_1 < s_2 < \dots} is an increasing sequence of positive integers and {P(x)} is a nonzero polynomial then we cannot have

\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)

as formal series.

To see this, note that all sufficiently large {x^N} have coefficient {1 + (-1) = 0}. Now, the intuitive idea is obvious: the root {1} appears with finite multiplicity in {P} so we can put {P(x) = (1-x)^k Q(x)} where {Q(1) \neq 0}, and then we get that {1-x} on the RHS divides {P} too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.

The right-hand side is “divisible” by {1-x}, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging {x \rightarrow 1^-}, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting {x}, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function {p(n)}.

2. Formal Series {\neq} Functional Series

I’m assuming you’ve all heard the definition of {\sum_k c_kx^k}. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over {\mathbb C}, but one can of course use {\mathbb R} instead.

The formal series is easier to describe:

Definition 1

A formal series {F} is an infinite sequence {(a_n)_n = (a_0, a_1, a_2, \dots)} of complex numbers. We often denote it by {\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}. The set of formal series is denoted {\mathbb C[ [x] ]}.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in {x}”.

On the other hand, a functional series is more involved, because it has to support substitution of values of {x} and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series {G} (centered at zero) is a function {G : U \rightarrow \mathbb C}, where {U} is an open disk centered at {0} or {U = \mathbb C}. We require that there exists an infinite sequence {(c_0, c_1, c_2, \dots)} of complex numbers satisfying

\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).

(The limit is take in the usual metric of {\mathbb C}.) In that case, the {c_i} are unique and called the coefficients of {G}.

This is often written as {G(x) = \sum_n c_n x^n}, with the open set {U} suppressed.

Remark 3

Some remarks on the definition of functional series:

  • This is enough to imply that {G} is holomorphic (and thus analytic) on {U}.
  • For experts: note that I’m including the domain {U} as part of the data required to specify {G}, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by {U}.
  • For experts: Perhaps non-standard, {U \neq \{0\}}. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series {F} is a sequence, while a functional series {G} is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series {G} we can extract its coefficients and make them into a formal series {F}. So, for lack of better notation:

Definition 4

If {F = (a_n)_n} is a formal series, and {G : U \rightarrow \mathbb C} is a functional series whose coefficients equal {F}, then we write {F \simeq G}.

3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect {\simeq}.

For formal series:

Definition 5

Let {F_1 = (a_n)_n} and {F_2 = (b_n)_n} be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes {\mathbb C[ [x] ]} into a ring, with identity {(0,0,0,\dots)} and {(1,0,0,\dots)}.

We also define the derivative {F = (a_n)_n} by {F' = ((n+1)a_{n+1})_n}.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use {\sum_n a_nx^n} more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of {x} serves as motivation only.

Exercise 6

Show that if {F = \sum_n a_nx^n} is a formal series, then it has a multiplicative inverse if and only if {a_0 \neq 0}.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let {G_1 : U \rightarrow \mathbb C} and {G_2 : U \rightarrow \mathbb C} be functional series with the same domain {U}. Then {G_1 \pm G_2} and {G_1 \cdot G_2} are defined pointwise.

If {G : U \rightarrow \mathbb C} is a functional series (hence holomorphic), then {G'} is defined poinwise.

If {G} is nonvanishing on {U}, then {1/G : U \rightarrow \mathbb C} is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose {F}, {F_1}, {F_2} are formal series, and {G}, {G_1}, {G_2} are functional series {U \rightarrow \mathbb C}. Assume {F \simeq G}, {F_1 \simeq G_1}, {F_2 \simeq G_2}.

  • {F_1 \pm F_2 \simeq G_1 \pm G_2}, {F_1 \cdot F_2 = G_1 \cdot G_2}.
  • {F' \simeq G'}.
  • If {1/G} is defined, then {1/F} is defined and {1/F \simeq 1/G}.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

4. Limits

We need to start considering limits of {(F_k)_k} and {(G_k)_k}, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let {F_1 = \sum_n a_n x^n} and {F_2 = \sum_n b_n x^n} be formal series, and define the difference by

\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}

This function makes {\mathbb C[[x]]} into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by {\left\lVert F \right\rVert = d(F,0)} above.

Thus, {\lim_{k \rightarrow \infty} F_k = F} if each coefficient of {x^n} eventually stabilizes as {k \rightarrow \infty}. For example, as formal series we have that {(1,-1,0,0,\dots)}, {(1,0,-1,0,\dots)}, {(1,0,0,-1,\dots)} converges to {1 = (1,0,0,0\dots)}, which we write as

\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.

As for functional series, since they are functions on the same open set {U}, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, {F_k \simeq G_k} for every {k}.

  • Here is an example showing that if {\lim_k F_k = F}, the functions {G_k} may not converge even pointwise. Indeed, just take {F_k = 1 - x^k} as before, and let {U = \{ z : |z| < 2 \}}.
  • Here is an example showing that even if {G_k \rightarrow G} uniformly, {\lim_k F_k} may not exist. Take {G_k = 1 - 1/k} as constant functions. Then {G_k \rightarrow 1}, but {\lim_k F_k} doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
  • The following example from this math.SE answer by Robert Israel shows that it’s possible that {F = \lim_k F_k} exists, and {G_k \rightarrow G} pointwise, and still {F \not\simeq G}. Let {U} be the open unit disk, and set

    \displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

    for {k \ge 1}. By Runge theorem there’s a polynomial {p_k(z)} such that

    \displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.


    \displaystyle G_k(z) = z^{k+1} p_k(z)

    is the desired counterexample (with {F_k} being the sequence of coefficients from {G}). Indeed by construction {\lim_k F_k = 0}, since {\left\lVert F_k \right\rVert \le 2^{-k}} for each {k}. Alas, {|g_k(z) - z| \le 2/k} for {z \in A_k \cup B_k}, so {G_k \rightarrow z} converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that {G_k \rightarrow G} converges uniformly. Then if {F_k \simeq G_k} for every {k}, and {\lim_k F_k = F}, then {F \simeq G}.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG {G = 0}, and let {g_n(z) = \sum{a^{(n)}_kz^k}}. It suffices to show that {a_k = 0} for all {k}. Choose any {0<r<1}. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since {g_n} converges uniformly to {g} on {U}. Hence, {a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}. Since {a^{(n)}_k = 0} for {n\ge k}, the result follows. \Box

The take-away from this section is that limits are relatively poorly behaved.

5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On {e^{1+x}})

The expression

\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}

does not make sense as a formal series: we observe that for every {N} the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on {U = \mathbb C}, namely to {e^{1+x}}.

Exercise 13

Let {(F_k)_{k \ge 1}} be formal series.

  • Show that an infinite sum {\sum_{k=1}^\infty F_k(x)} converges as formal series exactly when {\lim_k \left\lVert F_k \right\rVert = 0}.
  • Assume for convenience {F_k(0) = 1} for each {k}. Show that an infinite product {\prod_{k=0}^{\infty} (1+F_k)} converges as formal series exactly when {\lim_k \left\lVert F_k-1 \right\rVert = 0}.

Now the upshot is that one example of a convergent formal sum is the expression {\lim_{N} \sum_{n=0}^N a_nx^n} itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing {G} from {F})

Let {F = \sum a_nx^n} be a formal series and let

\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.

If {r > 0} then there exists a functional series {G} on {U = \{ |z| < r \}} such that {F \simeq G}.

Proof: Let {F_k} and {G_k} be the corresponding partial sums of {c_0x^0} to {c_kx^k}. Then by Cauchy-Hadamard theorem, we have {G_k \rightarrow G} uniformly on (compact subsets of) {U}. Also, {\lim_k F_k = F} by construction. \Box

This works less well with products: for example we have

\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})

as formal series, but we can’t “plug in {x=1}”, for example,

6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})

cannot hold as formal series. We know that tacitly, this just means

\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient {1}, so we may invert them; thus it suffices to show we cannot have

\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where {s_j = j}, which gives the partition function {p(n)} mentioned before). So the two rates of growth can’t match.

Linnik’s Theorem for Sato-Tate Laws on CM Elliptic Curves

\title{A Variant of Linnik for Elliptic Curves} \maketitle

Here I talk about my first project at the Emory REU. Prerequisites for this post: some familiarity with number fields.

1. Motivation: Arithemtic Progressions

Given a property {P} about primes, there’s two questions we can ask:

  1. How many primes {\le x} are there with this property?
  2. What’s the least prime with this property?

As an example, consider an arithmetic progression {a}, {a+d}, \dots, with {a < d} and {\gcd(a,d) = 1}. The strong form of Dirichlet’s Theorem tells us that basically, the number of primes {\equiv a \pmod d} is {\frac 1d} the total number of primes. Moreover, the celebrated Linnik’s Theorem tells us that the first prime is {O(d^L)} for a fixed {L}, with record {L = 5}.

As I talked about last time on my blog, the key ingredients were:

  • Introducing Dirichlet characters {\chi}, which are periodic functions modulo {q}. One uses this to get the mod {q} into the problem.
  • Introducing an {L}-function {L(s, \chi)} attached to {\chi}.
  • Using complex analysis (Cauchy’s Residue Theorem) to boil the proof down to properties of the zeros of {L(s, \chi)}.

With that said, we now move to the object of interest: elliptic curves.

2. Counting Primes

Let {E} be an elliptic curve over {\mathbb Q}, which for our purposes we can think of concretely as a curve in Weirestrass form

\displaystyle y^2 = x^3 + Ax + B

where the right-hand side has three distinct complex roots (viewed as a polynomial in {x}). If we are unlucky enough that the right-hand side has a double root, then the curve ceases to bear the name “elliptic curve” and instead becomes singular.


Here’s a natural number theoretic question: for any rational prime {p}, how many solutions does {E} have modulo {p}?

To answer this it’s helpful to be able to think over an arbitrary field {F}. While we’ve written our elliptic curve {E} as a curve over {\mathbb Q}, we could just as well regard it as a curve over {\mathbb C}, or as a curve over {\mathbb Q(\sqrt 2)}. Even better, since we’re interested in counting solutions modulo {p}, we can regard this as a curve over {\mathbb F_p}. To make this clear, we will use the notation {E/F} to signal that we are thinking of our elliptic curve over the field {F}. Also, we write {\#E(F)} to denote the number of points of the elliptic curve over {F} (usually when {F} is a finite field). Thus, the question boils down to computing {\#E(\mathbb F_p)}.

Anyways, the question above is given by the famous Hasse bound, and in fact it works over any number field!

Theorem 1 (Hasse Bound)

Let {K} be a number field, and let {E/K} be an elliptic curve. Consider any prime ideal {\mathfrak p \subseteq \mathcal O_K} which is not ramified. Then we have

\displaystyle \#E(\mathbb F_\mathfrak p) = \mathrm{N}\mathfrak p + 1 - a_\mathfrak p

where {\left\lvert a_\mathfrak p \right\rvert \le 2\sqrt{\mathrm{N}\mathfrak p}}.

Here {\mathbb F_\mathfrak p = \mathcal O_K / \mathfrak p} is the field of {\mathrm{N}\mathfrak p} elements. The extra “{+1}” comes from a point at infinity when you complete the elliptic curve in the projective plane.

Here, the ramification means what you might guess. Associated to every elliptic curve over {\mathbb Q} is a conductor {N}, and a prime {p} is ramified if it divides {N}. The finitely many ramified primes are the “bad” primes for which something breaks down when we take modulo {p} (for example, perhaps the curve becomes singular).

In other words, for the {\mathbb Q} case, except for finitely many bad primes {p}, the number of solutions is {p + 1 + O(\sqrt p)}, and we even know the implied {O}-constant to be {2}.

Now, how do we predict the error term?

3. The Sato-Tate Conjecture

For elliptic curves over {\mathbb Q}, we the Sato-Tate conjecture (which recently got upgraded to a theorem) more or less answers the question. But to state it, I have to introduce a new term: an elliptic curve {E/\mathbb Q}, when regarded over {\mathbb C}, can have complex multiplication (abbreviated CM). I’ll define this in just a moment, but for now, the two things to know are

  • CM curves are “special cases”, in the sense that a randomly selected elliptic curve won’t have CM.
  • It’s not easy in general to tell whether a given elliptic curve has CM.

Now I can state the Sato-Tate result. It is most elegantly stated in terms of the following notation: if we define {a_p = p + 1 - \#E(\mathbb F_p)} as above, then there is a unique {\theta_p \in [0,\pi]} which obeys

\displaystyle a_p = 2 \sqrt p \cos \theta_p.

Theorem 2 (Sato-Tate)

Fix an elliptic curve {E/\mathbb Q} which does not have CM (when regarded over {\mathbb C}). Then as {p} varies across unramified primes, the asymptotic probability that {\theta_p \in [\alpha, \beta]} is

\displaystyle \frac{2}{\pi} \int_{[\alpha, \beta]} \sin^2\theta_p.

In other words, {\theta_p} is distributed according to the measure {\sin^2\theta}.

Now, what about the CM case?

4. CM Elliptic Curves

Consider an elliptic curve {E/\mathbb Q} but regard it as a curve over {\mathbb C}. It’s well known that elliptic curves happen to have a group law: given two points on an elliptic curve, you can add them to get a third point. (If you’re not familiar with this, Wikipedia has a nice explanation). So elliptic curves have more structure than just their set of points: they form an abelian group; when written in Weirerstrass form, the identity is the point at infinity.


Letting {A = (A, +)} be the associated abelian group, we can look at the endomorphisms of {E} (that is, homomorphisms {A \rightarrow A}). These form a ring, which we denote {\text{End }(E)}. An example of such an endomorphism is {a \mapsto n \cdot a} for an integer {n} (meaning {a+\dots+a}, {n} times). In this way, we see that {\mathbb Z \subseteq \text{End }(E)}.

Most of the time we in fact have {\text{End }(E) \cong \mathbb Z}. But on occasion, we will find that {\text{End }(E)} is congruent to {\mathcal O_K}, the ring of integers of a number field {K}. This is called complex multiplication by {K}.

Intuitively, this CM is special (despite being rare), because it means that the group structure associated to {E} has a richer set of symmetry. For CM curves over any number field, for example, the Sato-Tate result becomes very clean, and is considerably more straightforward to prove.

Here’s an example. The elliptic curve

\displaystyle E : y^2 = x^3 - 17 x

of conductor {N = 2^6 \cdot 17^2} turns out to have

\displaystyle \text{End }(E) \cong \mathbb Z[i]

i.e. it has complex multiplication has {\mathbb Z[i]}. Throwing out the bad primes {2} and {17}, we compute the first several values of {a_p}, and something bizarre happens. For the {3} mod {4} primes we get

\displaystyle \begin{aligned} a_{3} &= 0 \\ a_{7} &= 0 \\ a_{11} &= 0 \\ a_{19} &= 0 \\ a_{23} &= 0 \\ a_{31} &= 0 \end{aligned}

and for the {1} mod {4} primes we have

\displaystyle \begin{aligned} a_5 &= 4 \\ a_{13} &= 6 \\ a_{29} &= 4 \\ a_{37} &= 12 \\ a_{41} &= -8 \end{aligned}

Astonishingly, the vanishing of {a_p} is controlled by the splitting of {p} in {\mathbb Z[i]}! In fact, this holds more generally. It’s a theorem that for elliptic curves {E/\mathbb Q} with CM, we have {\text{End }(E) \cong \mathcal O_K} where {K} is some quadratic imaginary number field which is also a PID, like {\mathbb Z[i]}. Then {\mathcal O_K} governs how the {a_p} behave:

Theorem 3 (Sato-Tate Over CM)

Let {E/\mathbb Q} be a fixed elliptic curve with CM by {\mathcal O_K}. Let {\mathfrak p} be a unramified prime of {\mathcal O_K}.

  1. If {\mathfrak p} is inert, then {a_\mathfrak p = 0} (i.e. {\theta_\mathfrak p = \frac{1}{2}\pi}).
  2. If {\mathfrak p} is split, then {\theta_\mathfrak p} is uniform across {[0, \pi]}.

I’m told this is much easier to prove than the usual Sato-Tate.

But there’s even more going on in the background. If I look again at {a_p} where {p \equiv 1 \pmod 4}, I might recall that {p} can be written as the sum of squares, and construct the following table:

\displaystyle \begin{array}{rrl} p & a_p & x^2+y^2 \\ 5 & 4 & 2^2 + 1^2 \\ 13 & 6 & 3^2 + 2^2 \\ 29 & 4 & 2^2 + 5^2 \\ 37 & 12 & 6^2 + 1^2 \\ 41 & -8 & 4^2 + 5^2 \\ 53 & 14 & 7^2 + 2^2 \\ 61 & 12 & 6^2 + 5^2 \\ 73 & -16 & 8^2 + 3^2 \\ 89 & -10 & 5^2 + 8^2 \\ \end{array}

Each {a_p} is double one of the terms! There is no mistake: the {a_p} are also tied to the decomposition of {p = x^2+y^2}. And this works for any number field.

What’s happening? The main idea is that looking at a prime ideal {\mathfrak p = (x+yi)}, {a_\mathfrak p} is related to the argument of the complex number {x+yi} in some way. Of course, there are lots of questions unanswered (how to pick the {\pm} sign, and which of {x} and {y} to choose) but there’s a nice way to package all this information, as I’ll explain momentarily.

(Aside: I think the choice of having {x} be the odd or even number depends precisely on whether {p} is a quadratic residue modulo {17}, but I’ll have to check on that.)

5. {L}-Functions

I’ll just briefly explain where all this is coming from, and omit lots of details (in part because I don’t know all of them). Let {E/\mathbb Q} be an elliptic curve with CM by {\mathcal O_K}. We can define an associated {L}-function

\displaystyle L(s, E/K) = \prod_\mathfrak p \left( 1 - \frac{a_\mathfrak p}{(\mathrm{N}\mathfrak p)^{s+\frac{1}{2}}} + \frac{1}{(\mathrm{N}\mathfrak p)^{2s}} \right)

(actually this isn’t quite true actually, some terms change for ramified primes {\mathfrak p}).

At the same time there’s a notion of a Hecke Grössencharakter {\xi} on a number field {K} — a higher dimensional analog of the Dirichlet charaters we used on {\mathbb Z} to filter modulo {q}. For our purposes, think of it as a multiplicative function which takes in ideals of {\mathcal O_K} and returns complex numbers of norm {1}. Like Dirichlet characters, each {\xi} gets a Hecke {L}-function

\displaystyle L(s, \xi) = \prod_\mathfrak p \left( 1 - \frac{\xi(\mathfrak p)}{(\mathrm{N}\mathfrak p)^s} \right)

which again extends to a meromorphic function on the entire complex plane.

Now the great theorem is:

Theorem 4 (Deuring)

Let {E/\mathbb Q} have CM by {\mathcal O_K}. Then

\displaystyle L(s,E/K) = L(s, \xi)L(s, \overline{\xi})

for some Hecke Grössencharakter {\xi}.

Using the definitions given above and equating the Euler products at an unramified {\mathfrak p} gives

\displaystyle 1 - \frac{a_\mathfrak p}{(\mathrm{N}\mathfrak p)^{s+\frac{1}{2}}} + \frac{1}{(\mathrm{N}\mathfrak p)^{2s}} = \left( 1 - \frac{\xi(\mathfrak p)}{(\mathrm{N}\mathfrak p)^s} \right) \left( 1 - \frac{\overline{\xi(\mathfrak p)}}{(\mathrm{N}\mathfrak p)^s} \right)

Upon recalling that {a_\mathfrak p = 2 \sqrt{\mathrm{N}\mathfrak p} \cos \theta_\mathfrak p}, we derive

\displaystyle \xi(\mathfrak p) = \exp(\pm i \theta_\mathfrak p).

This is enough to determine the entire {\xi} since {\xi} is multiplicative.

So this is the result: let {E/\mathbb Q} be an elliptic curve of conductor {N}. Given our quadratic number field {K}, we define a map {\xi} from prime ideals of {\mathcal O_K} to the unit circle in {\mathbb C} by

\displaystyle \mathfrak p \mapsto \begin{cases} \exp(\pm i \theta_\mathfrak p) & \gcd(\mathrm{N}\mathfrak p, N) = 1 \\ 0 & \gcd(\mathrm{N}\mathfrak p, N) > 1. \end{cases}

Thus {\xi} is a Hecke Grössencharakter for some choice of {\pm} at each {\mathfrak p}.

It turns out furthermore that {\xi} has frequency {1}, which roughly means that the argument of {\xi\left( (\pi) \right)} is related to {1} times the argument of {\pi} itself. This fact is what explains the mysterious connection between the {a_p} and the solutions above.

6. Linnik-Type Result

With this in mind, I can now frame the main question: suppose we have an interval {[\alpha, \beta] \subset [0,\pi]}. What’s the first prime {p} such that {\theta_p \in [\alpha, \beta]}? We’d love to have some analog of Linnik’s Theorem here.

This was our project and the REU, and Ashvin, Peter and I proved that

Theorem 5

If a rational {E} has CM then the least prime {p} with {\theta_p \in [\alpha,\beta]} is

\displaystyle \ll \left( \frac{N}{\beta-\alpha} \right)^A.

I might blog later about what else goes into the proof of this. . . but Deuring’s result is one key ingredient, and a proof of an analogous theorem for non-CM curves would have to be very different.

Proof of Dirichlet’s Theorem on Arithmetic Progressions

In this post I will sketch a proof Dirichlet Theorem’s in the following form:

Theorem 1 (Dirichlet’s Theorem on Arithmetic Progression)


\displaystyle \psi(x;q,a) = \sum_{\substack{n \le x \\ n \equiv a \mod q}} \Lambda(n).

Let {N} be a positive constant. Then for some constant {C(N) > 0} depending on {N}, we have for any {q} such that {q \le (\log x)^N} we have

\displaystyle \psi(x;q,a) = \frac{1}{\phi(q)} x + O\left( x\exp\left(-C(N) \sqrt{\log x}\right) \right)

uniformly in {q}.

Prerequisites: complex analysis, previous two posts, possibly also Dirichlet characters. It is probably also advisable to read the last chapter of Hildebrand first, since this contains a much more thorough version of an easier version in which the zeros of {L}-functions are less involved.

Warning: I really don’t understand what I am saying. It is at least 50% likely that this post contains a major error, and 90% likely that there will be multiple minor errors. Please kindly point out any screw-ups of mine; thanks!

Throughout this post: {s = \sigma + it} and {\rho = \beta + i \gamma}, as always. All {O}-estimates have absolute constants unless noted otherwise, and {A \ll B} means {A = O(B)}, {A \asymp B} means {A \ll B \ll A}. By abuse of notation, {\mathcal L} will be short for either {\log q \left( \left\lvert t \right\rvert + 2 \right)} or {\log q \left( \left\lvert T \right\rvert + 2 \right)}, depending on context.

1. Outline

Here are the main steps:

  1. We introduce Dirichlet character {\chi : \mathbb N \rightarrow \mathbb C} which will serves as a roots of unity filter, extracting terms {\equiv a \pmod q}. We will see that this reduces the problem to estimating the function {\psi(x,\chi) = \sum_{n \le x} \chi(n) \Lambda(n)}.
  • Introduce the {L}-function {L(s, \chi)}, the generalization of {\zeta} for arithmetic progressions. Establish a functional equation in terms of {\xi(\chi,s)}, much like with {\zeta}, and use it to extend {L(s,\chi)} to a meromorphic function in the entire complex plane.
  • We will use a variation on the Perron transformation in order to transform this sum into an integral involving an {L}-function {L(\chi,s)}. We truncate this integral to {[c-iT, c+iT]}; this introduces an error {E_{\text{truncate}}} that can be computed immediately, though in this presentation we delay its computation until later.
  • We do a contour as in the proof of the Prime Number Theorem in order to estimate the above integral in terms of the zeros of {L(\chi, s)}. The main term emerges as a residue, so we want to show that the integral {E_{\text{contour}}} along this integral goes to zero. Moreover, we get some residues {\sum_\rho \frac{x^\rho}{\rho}} related to the zeros of the {L}-function.
  • By using Hadamard’s Theorem on {\xi(\chi,s)} which is entire, we can write {\frac{L'}{L}(s,\chi)} in terms of its zeros. This has three consequences:
    1. We can use the previous to get bounds on {\frac{L'}{L}(s, \chi)}.
    2. Using a 3-4-1 trick, this gives us information on the horizontal distribution of {\rho}; the dreaded Siegel zeros appear here.
    3. We can get an expression which lets us estimate the vertical distribution of the zeros in the critical strip (specifically the number of zeros with {\gamma \in [T-1, T+1]}).

    The first and third points let us compute {E_{\text{contour}}}.

  • The horizontal zero-free region gives us an estimate of {\sum_\rho \frac{x^\rho}{\rho}}, which along with {E_{\text{contour}}} and {E_{\text{truncate}}} gives us the value of {\psi(x,\chi)}.
  • We use Siegel’s Theorem to handle the potential Siegel zero that might arise.

Possibly helpful diagram:


The pink dots denote zeros; we think the nontrivial ones all lie on the half-line by the Generalized Riemann Hypothesis but they could actually be anywhere in the green strip.

2. Dirichlet Characters

2.1. Definitions

Recall that a Dirichlet character {\chi} modulo {q} is a completely multiplicative function {\chi : \mathbb N \rightarrow \mathbb C} which is also periodic modulo {q}, and vanishes for all {n} with {\gcd(n,q) > 1}. The trivial character (denoted {\chi_0}) is defined by {\chi_0(n) = 1} when {\gcd(n,q)=1} and {\chi_0(n) = 0} otherwise.

In particular, {\chi(1)=1} and thus each nonzero {\chi} value is a {\phi(q)}-th primitive root of unity; there are also exactly {\phi(q)} Dirichlet characters modulo {q}. Observe that {\chi(-1)^2 = \chi(1) = 1}, so {\chi(-1) = \pm 1}. We shall call {\chi} even if {\chi(1) = +1} and odd otherwise.

If {\tilde q \mid q}, then a character {\tilde\chi} modulo {\tilde q} induces a character {\chi} modulo {q} in a natural way: let {\chi = \tilde\chi} except at the points where {\gcd(n,q)>1} but {\gcd(n,\tilde q)=1}, letting {\chi} be zero at these points instead. (In effect, we are throwing away information about {\tilde\chi}.) A character {\chi} not induced by any smaller character is called primitive.

2.2. Orthogonality

The key fact about Dirichlet characters which will enable us to prove the theorem is the following trick:

Theorem 2 (Orthogonality of Dirichlet Characters)

We have

\displaystyle \sum_{\chi \mod q} \chi(a) \overline{\chi}(b) = \begin{cases} \phi(q) & \text{ if } a \equiv b \pmod q, \gcd(a,q) = 1 \\ 0 & \text{otherwise}. \end{cases}

(Here {\overline{\chi}} is the conjugate of {\chi}, which is essentially a multiplicative inverse.)

This is in some senses a slightly fancier form of the old roots of unity filter. Specifically, it is not too hard to show that {\sum_{\chi} \chi(n)} vanishes for {n \not\equiv 1 \pmod q} while it is equal to {\phi(q)} for {n \equiv 1 \pmod q}.

2.3. Dirichlet {L}-Functions

Now we can define the associated {L}-function by

\displaystyle L(\chi, s) = \sum_{n \ge 1} \chi(n) n^{-s} = \prod_p \left( 1-\chi(p) p^{-s} \right)^{-1}.

The properties of these {L}-functions are that

Theorem 3

Let {\chi} be a Dirichlet character modulo {q}. Then

  1. If {\chi \ne \chi_0}, {L(\chi, s)} can be extended to a holomorphic function on {\sigma > 0}.
  2. If {\chi = \chi_0}, {L(\chi, s)} can be extended to a meromorphic function on {\sigma > 0}, with a single simple pole at {s=1} of residue {\phi(q) / q}.

The proof is pretty much the same as for zeta.

Observe that if {q=1}, then {L(\chi, s) = \zeta(s)}.

2.4. The Functional Equation for Dirichlet {L}-Functions

While I won’t prove it here, one can show the following analog of the functional equation for Dirichlet {L}-functions.

Theorem 4 (The Functional Equation of Dirichlet {L}-Functions)

Assume that {\chi} is a character modulo {q}, possibly trivial or imprimitive. Let {a=0} if {\chi} is even and {a=1} if {\chi} is odd. Let

\displaystyle \xi(s,\chi) = q^{\frac{1}{2}(s+a)} \gamma(s,\chi) L(s,\chi) \left[ s(1-s) \right]^{\delta(x)}


\displaystyle \gamma(s,\chi) = \pi^{-\frac{1}{2}(s+a)} \Gamma\left( \frac{s+a}{2} \right)

and {\delta(\chi) = 1} if {\chi = \chi_0} and zero otherwise. Then

  1. {\xi} is entire.
  2. If {\chi} is primitive, then {\xi(s,\chi) = W(\chi)\xi(1-s, \overline{\chi})} for some complex number {\left\lvert W(\chi) \right\rvert = 1}.

Unlike the {\zeta} case, the {W(\chi)} is nastier to describe; computing it involves some Gauss sums that would be too involved for this post. However, I should point out that it is the Gauss sum here that requires {\chi} to be primitive. As before, {\xi} gives us an meromorphic continuation of {L(\chi, s)} in the entire complex plane. We obtain trivial zeros of {L(\chi, s)} as follows:

  • For {\chi} even, we get zeros at {-2}, {-4}, {-6} and so on.
  • For {\chi \neq \chi_0} even, we get zeros at {0}, {-2}, {-4}, {-6} and so on (since the pole of {\Gamma(\frac{1}{2} s)} at {s=0} is no longer canceled).
  • For {\chi} odd, we get zeros at {-1}, {-3}, {-5} and so on.

3. Obtaining the Contour Integral

3.1. Orthogonality

Using the trick of orthogonality, we may write

\displaystyle \begin{aligned} \psi(x;q,a) &= \sum_{n \le x} \frac{1}{\phi(q)} \sum_{\chi \mod q} \chi(n)\overline{\chi}(a) \Lambda(n) \\ &= \frac{1}{\phi(q)} \sum_{\chi \mod q} \overline{\chi}(a) \left( \sum_{n \le x} \chi(n) \Lambda(n) \right). \end{aligned}

To do this we have to estimate the sum {\sum_{n \le x} \chi(n) \Lambda(n)}.

3.2. Introducing the Logarithmic Derivative of the {L}-Function

First, we realize {\chi(n) \Lambda(n)} as the coefficients of a Dirichlet series. Recall last time we saw that {-\frac{\zeta'}{\zeta}} gave {\Lambda} as coefficients. We can do the same thing with {L}-functions: put

\displaystyle \log L(s, \chi) = -\sum_p \log \left( 1 - \chi(p) p^{-s} \right).

Taking the derivative, we obtain

Theorem 5

For any {\chi} (possibly trivial or imprimitive) we have

\displaystyle -\frac{L'}{L}(s, \chi) = \sum_{n \ge 1} \Lambda(n) \chi(n) n^{-s}.


\displaystyle \begin{aligned} -\frac{L'}{L}(s, \chi) &= \sum_p \frac{\log p}{1-\chi(p) p^{-s}} \\ &= \sum_p \log p \cdot \sum_{m \ge 1} \chi(p^m) (p^m)^{-s} \\ &= \sum_{n \ge 1} \Lambda(n) \chi(n) n^{-s} \end{aligned}

as desired. \Box

3.3. The Truncation Trick

Now, we unveil the trick at the heart of the proof of Perron’s Formula in the last post. I will give a more precise statement this time, by stating where this integral comes from:

Lemma 6 (Truncated Version of Perron Lemma)

For any {c,y,T > 0} define

\displaystyle I(y,T) = \frac{1}{2\pi i} \int_{c-iT}^{c+iT} \frac{y^s}{s} \; ds

Then {I(y,T) = \delta(y) + E(y,T)} where {\delta(y)} is the indicator function defined by

\displaystyle \delta(y) = \begin{cases} 0 & 0 < y < 1 \\ \frac{1}{2} & y=1 \\ 1 & y > 1 \end{cases}

and the error term {E(y,T)} is given by

\displaystyle \left\lvert E(y,T) \right\rvert < \begin{cases} y^c \min \left\{ 1, \frac{1}{T \left\lvert \log y \right\rvert} \right\} & y \neq 1 \\ cT^{-1} & y=1. \end{cases}

In particular, {I(y,\infty) = \delta(y)}.

In effect, the integral from {c-iT} to {c+iT} is intended to mimic an indicator function. We can use it to extract the terms of the Dirichlet series of {-\frac{L'}{L}(s, \chi)} which happen to have {n \le x}, by simply appealing to {\delta(x/n)}. Unfortunately, we cannot take {T = \infty} because later on this would introduce a sum which is not absolutely convergent, meaning we will have to live with the error term introduced by picking a particular finite value of {T}.

3.4. Applying the Truncation

Let’s do so: define

\displaystyle \psi(x;\chi) = \sum_{n \ge 1} \delta\left( x/n \right) \Lambda(n) \chi(n)

which is almost the same as {\sum_{n \le x} \Lambda(n) \chi(n)}, except that if {x} is actually an integer then {\Lambda(x)\chi(x)} should be halved (since {\delta(\frac{1}{2}) = \frac{1}{2}}). Now, we can substitute in our integral representation, and obtain

\displaystyle \begin{aligned} \psi(x;\chi) &= \sum_{n \ge 1} \Lambda(n) \chi(n) \cdot \left( E(x/n,T) + \int_{c-iT}^{c+iT} \frac{(x/n)^s}{s} \; ds \right) \\ &= \sum_{n \ge 1} \Lambda(n) \chi(n) \cdot E(x/n, T) + \int_{c-iT}^{c+iT} \sum_{n \ge 1} \left( \Lambda(n)\chi(n) n^{-s} \right) \frac{x^s}{s} \; ds \\ &= E_{\text{truncate}} + \int_{c-iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \end{aligned}


\displaystyle E_{\text{truncate}} = \sum_{n \ge 1} \Lambda(n) \chi(n) \cdot E(x/n, T).

Estimating this is quite ugly, so we defer it to later.

4. Applying the Residue Theorem

4.1. Primitive Characters

Exactly like before, we are going to use a contour to estimate the value of

\displaystyle \int_{c-iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds.

Let {U} be a large half-integer (so no zeros of {L(\chi,s)} with {\text{Re } s = U}). We then re-route the integration path along the contour integral

\displaystyle c-iT \rightarrow -U-iT \rightarrow -U+iT \rightarrow c+iT.

During this process we pick up residues, which are the interesting terms.

First, assume that {\chi} is primitive, so the functional equation applies and we get the information we want about zeros.

  • If {\chi = \chi_0}, then so we pick up a residue of {+x} corresponding to

    \displaystyle (-1) \cdot -x^1/1 = +x.

    This is the “main term”. Per laziness, {\delta(\chi) x} it is.


  • Depending on whether {\chi} is odd or even, we detect the trivial zeros, which we can express succinctly by

    \displaystyle \sum_{m \ge 1} \frac{x^{a-2m}}{2m-a}

    Actually, I really ought to truncate this at {U}, but since I’m going to let {U \rightarrow \infty} in a moment I really don’t want to take the time to do so; the difference is negligible.

  • We obtain a residue of {-\frac{L'}{L}(s, \chi)} at {s = 0}, which we denote {b(\chi)}, for {s=0}. Observe that if {\chi} is even, this is the constant term of {-\frac{L'}{L}(s, \chi)} near {s=0} (but there is a pole of the whole function at {s=0}); otherwise it equals the value of {-\frac{L'}{L}(0, \chi)} straight-out.
  • If {\chi \ne \chi_0} is even then {L(s, \chi)} itself has a zero, so we are in worse shape. We recall that

    \displaystyle \frac{L'}{L}(s, \chi) = \frac 1 s + b(\chi) + \dots

    and notice that

    \displaystyle \frac{x^s}{s} = \frac 1s + \log x + \dots

    so we pick up an extra residue of {-\log x}. So, call this a bonus of {-(1-a) \log x}


  • Finally, the hard-to-understand zeros in the strip {0 < \sigma < 1}. If {\rho = \beta+i\gamma} is a zero, then it contributes a residue of {-\frac{x^\rho}{\rho}}. We only pick up the zeros with {\left\lvert \gamma \right\rvert < T} in our rectangle, so we get a term

    \displaystyle -\sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}.

Letting {U \rightarrow \infty} we derive that

\displaystyle \begin{aligned} &\phantom= \int_{c-iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \\ &= \delta(\chi) x + E_{\text{contour}} + \sum_{m \ge 1} \frac{x^{a-2m}}{2m-a} - b(\chi) - (1-a) \log x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} \end{aligned}

at least for primitive characters. Note that the sum over the zeros is not absolutely convergent without the restriction to {\left\lvert \gamma \right\rvert < T} (with it, the sum becomes a finite one).

4.2. Transition to nonprimitive characters

The next step is to notice that if {\chi} modulo {q} happens to be not primitive, and is induced by {\tilde\chi} with modulus {\tilde q}, then actually {\psi(x,\chi)} and {\psi(x,\tilde\chi)} are not so different. Specifically, they differ by at most

\displaystyle \begin{aligned} \left\lvert \psi(x,\chi)-\psi(x,\tilde\chi) \right\rvert &\le \sum_{\substack{\gcd(n,\tilde q)=1 \\ \gcd(n,q) > 1 \\ n \le x}} \Lambda(n) \\ &\le \sum_{\substack{\gcd(n,q) > 1 \\ n \le x}} \Lambda(n) \\ &\le \sum_{p \mid q} \sum_{\substack{p^k \le x}} \log p \\ &\le \sum_{p \mid q} \log x \\ &\le (\log q)(\log x) \end{aligned}

and so our above formula in fact holds for any character {\chi}, if we are willing to add an error term of {(\log q)(\log x)}. This works even if {\chi} is trivial, and also {\tilde q \le q}, so we will just simplify notation by omitting the tilde’s.

Anyways {(\log q)(\log x)} is piddling compared to all the other error terms in the problem, and we can swallow a lot of the boring residues into a new term, say

\displaystyle E_{\text{tiny}} \le (\log q + 1)(\log x) + 2.

Thus we have

\displaystyle \psi(x, \chi) = \delta(\chi) x + E_{\text{contour}} + E_{\text{truncate}} + E_{\text{tiny}} - b(\chi) - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}.

Unfortunately, the constant {b(\chi)} depends on {\chi} and cannot be absorbed. We will also estimate {E_{\text{contour}}} in the error term party.

5. Distribution of Zeros

In order to estimate

\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}

we will need information on both the vertical and horizontal distribution of the zeros. Also, it turns out this will help us compute {E_{\text{contour}}}.

5.1. Applying Hadamard’s Theorem

Let {\chi} be primitive modulo {q}. As we saw,

\displaystyle \xi(s,\chi) = (q/\pi)^{\frac{1}{2} s + \frac{1}{2} a} \Gamma\left( \frac{s+a}{2} \right) L(s, \chi) \left( s(1-s) \right)^{\delta(\chi)}

is entire. It also is easily seen to have order {1}, since no term grows much more than exponentially in {s} (using Stirling to handle the {\Gamma} factor). Thus by Hadamard, we may put

\displaystyle \xi(s, \chi) = e^{A(\chi)+B(\chi)z} \prod_\rho \left( 1-\frac{z}{\rho} \right) e^{\frac{z}{\rho}}.

Taking a logarithmic derivative and cleaning up, we derive the following lemma.

Lemma 7 (Hadamard Expansion of Logarithmic Derivative)

For any primitive character {\chi} (possibly trivial) we have

\displaystyle \begin{aligned} -\frac{L'}{L}(s, \chi) &= \frac{1}{2} \log\frac{q}{\pi} + \frac{1}{2}\frac{\Gamma'(\frac{1}{2} s + \frac{1}{2} a)}{\Gamma(\frac{1}{2} s + \frac{1}{2} a)} \\ &- B(\chi) - \sum_{\rho} \left( \frac{1}{s-\rho} + \frac{1}{\rho} \right) + \delta(\chi) \cdot \left( \frac{1}{s-1} + \frac 1s \right). \end{aligned}

Proof: One one hand, we have

\displaystyle \log \xi(s, \chi) = A(\chi) + B(\chi) s + \sum_\rho \left( \log \left( 1-\frac{s}{\rho} \right) + \frac{s}{\rho} \right).

On the other hand

\displaystyle \log \xi(s, \chi) = \frac{s+a}{2} \cdot \log \frac{q}{\pi} + \log \Gamma\left( \frac{s+a}{2} \right) + \log L(s, \chi) + \delta\chi(\log s + \log (1-s)).

Taking the derivative of both sides and setting them equal: we have on the left side

\displaystyle B(\chi) + \sum_{\rho} \left( \frac{1}{1-\frac{s}{\rho}} \cdot \frac{1}{-\rho} + \frac{1}{\rho} \right) = B(\chi) + \sum_\rho \left( \frac{1}{s-\rho} + \frac{1}{\rho} \right)

and on the right-hand side

\displaystyle \frac{1}{2} \log\frac{q}{\pi} + \frac{1}{2}\frac{\Gamma'}{\Gamma}\left( \frac{s+a}{2} \right) + \frac{L'}{L} (s, \chi) + \delta_\chi \left( \frac 1s + \frac{1}{s-1} \right).

Equating these gives the desired result. \Box

This will be useful in controlling things later. The {B(\chi)} is a constant that turns out to be surprisingly annoying; it is tied to {b(\chi)} from the contour, so we will need to deal with it.

5.2. A Bound on the Logarithmic Derivative

Frequently we will take the real part of this. Using Stirling, the short version of this is:

Lemma 8 (Logarithmic Derivative Bound)

Let {\sigma \ge 1} and {\chi} be primitive (possibly trivial). Then

\displaystyle \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] = \begin{cases} O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} + \text{Re } \frac{\delta(\chi)}{s-1} & 1 \le \sigma \le 2 \\ O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} & 1 \le \sigma \le 2, \left\lvert t \right\rvert \ge 2 \\ O(1) & \sigma \ge 2. \end{cases}

Proof: The claim is obvious for {\sigma \ge 2}, since we can then bound the quantity by {\frac{\zeta'(\sigma)}{\zeta(\sigma)} \le \frac{\zeta'(2)}{\zeta(2)}} due to the fact that the series representation is valid in that range. The second part with {\left\lvert t \right\rvert \ge 2} follows from the first line, by noting that {\text{Re } \frac{1}{s-1} < 1}. So it just suffices to show that

\displaystyle O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} + \text{Re } \frac{\delta(\chi)}{s-1}

where {1 \le \sigma \le 2} and {\chi} is primitive.

First, we claim that {\text{Re } B(\chi) = - \text{Re } \sum \frac{1}{\rho}}. We use the following trick:

\displaystyle B(\chi) = \frac{\xi'(0,\chi)}{\xi(0,\chi)} = -\frac{\xi'(1,\overline{\chi})}{\xi(1,\overline{\chi})} = \overline{B(\chi)} - \sum_{\overline{\rho}} \left( \frac{1}{1-\overline{\rho}} + \frac{1}{\overline{\rho}} \right) + \frac{\delta(\chi)}{s-1}

where the ends come from taking the logarithmic derivative directly. By switching {1-\overline{\rho}} with {\rho}, the claim follows.

Then, the lemma follows rather directly; the {\text{Re } \sum_\rho \frac{1}{\rho}} has miraculously canceled with {\text{Re } B(\chi)}. To be explicit, we now have

\displaystyle - \text{Re } \frac{L'(s, \chi)}{L(s, \chi)} = \frac{1}{2} \log\frac{q}{\pi} + \frac{1}{2} \text{Re } \frac{\Gamma'(\frac{1}{2} s + \frac{1}{2} a)}{\Gamma(\frac{1}{2} s + \frac{1}{2} a)} - \sum_{\rho} \text{Re } \frac{1}{s-\rho} + \frac{\delta(\chi)}{s} + \frac{\delta(\chi)}{s-1}

and the first two terms contribute {\log q} and {\log t}, respectively; meanwhile the term {\frac{\delta(\chi)}{s}} is at most {1}, so it is absorbed. \Box

Short version: our functional equation lets us relate {L(s, \chi)} to {L(1-s, \chi)} for {\sigma \le 0} (in fact it’s all we have!) so this gives the following corresponding estimate:

Lemma 9 (Far-Left Estimate of Log Derivative)

If {\sigma \le -1} and {t \ge 2} we have

\displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = O\left[ \log q\left\lvert s \right\rvert \right].


We have

\displaystyle L(1-s, \chi) = W(\chi) 2^{1-s} \pi^{-s} q^{s-\frac{1}{2}} \cos \frac{1}{2} \pi (s-a) \Gamma(s) L(s, \overline{\chi})

(the unsymmetric functional equation, which can be obtained from Legendre’s duplication formula). Taking a logarithmic derivative yields

\displaystyle \frac{L'}{L}(s, \chi) = \log \frac{q}{2\pi} - \frac{1}{2} \pi \tan \frac{1}{2} \pi(1-s-a) + \frac{\Gamma'}{\Gamma}(1-s) + \frac{L'}{L}(1-s, \overline{\chi}).

Because we assumed {\left\lvert t \right\rvert \ge 2}, the tangent function is bounded as {s} is sufficiently far from any of its poles along the real axis. Also since {\text{Re }(1-s) \ge 2} implies the {\frac{L'}{L}} term is bounded. Finally, the logarithmic derivative of {\Gamma} contributes {\log \left\lvert s \right\rvert} according to Stirling. So, total error is {O(\log q) + O(1) + O(\log \left\lvert s \right\rvert) + O(1)} and this gives the conclusion. \Box

5.3. Horizontal Distribution

I claim that:

Theorem 10 (Horizontal Distribution Bound)

Let {\chi} be a character, possibly trivial or imprimitive. There exists an absolute constant {c_1} with the following properties:

  1. If {\chi} is complex, then no zeros are in the region {\sigma \ge 1 - \frac{c_1}{\mathcal L}}.
  2. If {\chi} is real, there are no zeros in the region {\sigma \ge 1 - \frac{c_1}{\mathcal L}}, with at most one exception; this zero must be real and simple.

Such bad zeros are called Siegel zeros, and I will denote them {\beta_S}. The important part about this estimate is that it does not depend on {\chi} but rather on {q}. We need the relaxation to non-primitive characters, since we will use them in the proof of Landau’s Theorem.

Proof: First, assume {\chi} is both primitive and nontrivial.

By the 3-4-1 lemma on {\log L(\chi, s)} we derive that

\displaystyle 3 \text{Re } \left[ -\frac{L'(\sigma, \chi_0)}{L(\sigma, \chi_0)} \right] + 4 \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] + \text{Re } \left[ -\frac{L'(\sigma+2it, \chi^2)}{L(\sigma+2it, \chi^2)} \right] \ge 0.

This is cool because we already know that

\displaystyle \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] < O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho}

We now assume {\sigma > 1}.

In particular, we now have (since {\text{Re } \rho < 1} for any zero {\rho})

\displaystyle \text{Re } \frac{1}{s-\rho} > 0.

So we are free to throw out as many terms as we want.

If {\chi^2} is primitive, then everything is clear. Let {\rho = \beta + i \gamma} be a zero. Then

\displaystyle \begin{aligned} \text{Re } \left[ -\frac{L'(\sigma, \chi_0)}{L(\sigma, \chi_0)} \right] &\le \frac{1}{\sigma-1} + O(1) \\ \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] &\le O(\mathcal L) - \frac{1}{s-\rho} \\ \text{Re } \left[ -\frac{L'(\sigma+2it, \chi^2)}{L(\sigma+2it, \chi^2)} \right] &\le O(\mathcal L) \end{aligned}

where we have dropped all but one term for the second line, and all terms for the third line. If {\chi^2} is not primitive but at least is not {\chi_0}, then we can replace {\chi^2} with the inducing {\tilde\chi_2} for a penalty of at most

\displaystyle \begin{aligned} \text{Re } \frac{L'}{L}(s, \tilde\chi) - \text{Re } \frac{L'}{L}(s, \chi^2) &< \text{Re } \sum_{p^k \mid q} \tilde\chi(p^k) \log p \cdot \text{Re } (p^k)^{-s} \\ &< \sum_{p \mid q} \log p \cdot (p^{-\sigma} + p^{-2\sigma} + \dots) \\ &< \sum_{p \mid q} \log p \cdot 1 \\ &\le \log q \end{aligned}

just like earlier: {\Lambda} is usually zero, so we just look at the differing terms! The Dirichlet series really are practically the same. (Here we have also used the fact that {\sigma > 1}, and {p \ge 2}.)

Consequently, we derive using {3-4-1} that

\displaystyle \frac{3}{\sigma-1} - \frac{4}{s-\rho} + O(\mathcal L) \ge 0.

Selecting {s = \sigma + i \gamma} so that {s - \rho = \sigma-\beta}, we thus obtain

\displaystyle \frac{4}{\sigma-\beta} \le \frac{3}{\sigma-1} + O(\mathcal L).

If we select {\sigma = 1 + \frac{\varepsilon}{\mathcal L}}, we get

\displaystyle \frac{4}{1 + \frac{\varepsilon}{\mathcal L} - \beta} \le O(\mathcal L)


\displaystyle \beta < 1 - \frac{c_2}{\mathcal L}

for some constant {c_2}, initially only for primitive {\chi}.

But because the Euler product of the {L}-function of an imprimitive character versus its primitive inducing character differ by a finite number of zeros on the line {\sigma=0} it follows that this holds for all nontrivial complex characters.

Unfortunately, if we are unlucky enough that {\tilde\chi_2} is trivial, then replacing {\chi^2} causes all hell to break loose. (In particular, {\chi} is real in this case!) The problem comes in that our new penalty has an extra {\frac{1}{s-1}}, so

\displaystyle \left\lvert \text{Re } \frac{L'}{L}(s, \chi^2) - \text{Re } \frac{\zeta'}{\zeta}(s) \right\rvert < \frac{1}{s-1} + \log q

Applied with {s = \sigma + 2it}, we get the weaker

\displaystyle \frac{3}{\sigma-1} - \frac{4}{s-\rho} + O(\mathcal L) + \frac{1}{\sigma - 1 + 2it} \ge 0.

If {\left\lvert t \right\rvert > \frac{\delta}{\log q}} for some {\delta} then the {\frac{1}{\sigma-1+2it}} term will be at most {\frac{\log q}{\delta} = O(\mathcal L)} and we live to see another day. In other words, we have unconditionally established a zero-free region of the form

\displaystyle \sigma > 1 - \frac{c(\delta)}{\mathcal L} \quad\text{and}\quad \left\lvert t \right\rvert > \frac{\delta}{\log q}

for any {\delta > 0}.

Now let’s examine {\left\lvert t \right\rvert < \frac{\delta}{\log q}}. We don’t have the facilities to prove that there are no bad zeros, but let’s at least prove that the zero must be simple and real. By Hadamard at {t=0}, we have

\displaystyle -\frac{L'(\sigma, \chi)}{L(\sigma, \chi)} < O(\mathcal L) - \sum_\rho \frac{1}{\sigma-\rho}

where we no longer need the real parts since {\chi} is real, and in particular the roots of {L(s,\chi)} come in conjugate pairs. The left-hand side can be stupidly bounded below by

\displaystyle -\frac{L'(\sigma, \chi)}{L(\sigma, \chi)} \ge - \sum_{n \ge 1} (-1) \cdot \log n \cdot n^{-\sigma} = \frac{\zeta'(\sigma)}{\zeta(\sigma)} > -\frac{1}{\sigma-1} - O(1).


\displaystyle -\frac{1}{\sigma-1} < O(\mathcal L) - \sum_\rho \frac{1}{\sigma-\rho}.

In other words,

\displaystyle \sum_\rho \text{Re } \frac{\sigma-\rho}{\left\lvert \sigma-\rho \right\rvert^2} < \frac{1}{\sigma-1} + O(\mathcal L).

Then, let {\sigma = 1 + \frac{2\delta}{\log q}}, so

\displaystyle \sum_\rho \text{Re } \frac{\sigma-\rho}{\left\lvert \sigma-\rho \right\rvert^2} < \frac{\log q}{2\delta} + O(\mathcal L).

The rest is arithmetic; basically one finds that there can be at most one Siegel zero. In particular, since complex zeros come in conjugate pairs, that zero must be real.

It remains to handle the case that {\chi = \chi_0} is the constant function giving {1}. For this, we observe that the {L}-function in question is just {\zeta}. Thus, we can decrease the constant {c_2} to some {c_1} in such a way that the result holds true for {\zeta}, which completes the proof. \Box

5.4. Vertical Distribution

We have the following lemma:

Lemma 11 (Sum of Zeros Lemma)

For all real {t} and primitive characters {\chi} (possibly trivial), we have

\displaystyle \sum_\rho \frac{1}{4+(t-\gamma)^2} = O(\mathcal L).

Proof: We already have that

\displaystyle \text{Re } -\frac{L'}{L}(s, \chi) = O(\mathcal L) - \sum_\rho \text{Re } \frac{1}{s-\rho}

and we take {s = 2 + it}, noting that the left-hand side is bounded by a constant {\frac{\zeta'}{\zeta}(2) = -0.569961}. On the other hand, { \text{Re } \frac{1}{2+it-\rho} = \frac{\text{Re }(2+it-\rho)}{\left\lvert (2-\beta) + (t-\gamma)i \right\rvert^2} = \frac{2-\beta}{(2-\beta)^2+(t-\gamma)^2}} and

\displaystyle \frac{1}{4+(t-\gamma)^2} \le \frac{2-\beta}{(2-\beta)^2+(t-\gamma)^2} \le \frac{2}{1+(t-\gamma)^2}

as needed. \Box
From this we may deduce that

Lemma 12 (Number of Zeros Nearby {T})

For all real {t} and primitive characters {\chi} (possibly trivial), the number of zeros {\rho} with {\gamma \in [t-1, t+1]} is {O(\mathcal L)}.

In particular, we may perturb any given {T} by {\le 2} so that the distance between it and the nearest zero is at least {c_0 \mathcal L^{-1}}, for some absolute constant {c_0}.

From this, using an argument principle we can actually also obtain the following: For a real number {T > 0}, we have { N(T, \chi) = \frac{T}{\pi} \log \frac{qT}{2\pi e} + O(\mathcal L)} is the number of zeros of {L(s, \chi)} with imaginary part {\gamma \in [-T, T]}. However, we will not need this fact.

6. Error Term Party

Up to now, {c} has been arbitrary. Assume now {x \ge 6}; thus we can now follow the tradition

\displaystyle c = 1 + \frac{1}{\log x} < 2

so {c} is just to the right of the critical line. This causes {x^c = ex}. We assume also for convenience that {T \ge 2}.

6.1. Estimating the Truncation Error

Recall that

\displaystyle \left\lvert E(y,T) \right\rvert < \begin{cases} y^c \min \left\{ 1, \frac{1}{T \left\lvert \log y \right\rvert} \right\} & y \neq 1 \\ cT^{-1} & y=1. \end{cases}

We need to bound the right-hand side of

\displaystyle \left\lvert E_{\text{truncate}} \right\rvert \le \sum_{n \ge 1} \left\lvert \Lambda(n) \chi(n) \cdot E(x/n, T) \right\rvert = \sum_{n \ge 1} \Lambda(n) \left\lvert E(x/n, T) \right\rvert.

If {\frac34 x \le n \le \frac 54x}, the log part is small, and this is bad. We have to split into three cases: {\frac34 x \le n < x}, {n = x}, and {x < n \le \frac 54x}. This is necessary because in the event that {\Lambda(x) \neq 0} ({x} is a prime power), then {E(x/n,T) = E(1,T)} needs to be handled differently.

We let {x_{\text{left}}} and {x_{\text{right}}} be the nearest prime powers to {x} other than {x} itself. Thus this breaks our region to conquer into

\displaystyle \frac 34 x \le x_{\text{left}} < x < x_{\text{right}} \le \frac 54 x.

So we have possibly a center term (if {x} is a prime power, we have a term {n=x}), plus the far left interval and the far right interval. Let {d = \min\left\{ x-x_{\text{left}}, x_{\text{right}}-x \right\}} for convenience.

  • In the easy case, if {n = x} we have a contribution of {E(1,T) \log x < \frac{c}{T}\log x}, which is piddling (less than {\log x}).
  • Suppose {\frac 34x \le n \le x_{\text{left}} - 1}. If {n = x_{\text{left}} - a} for some integer {1 \le a \le \frac 14x}, then

    \displaystyle \log \frac xn \ge \log \frac{x_{\text{left}}}{x_{\text{left}}-a} = -\log\left( 1 - \frac{a}{x_{\text{left}}} \right) \ge \frac{a}{x_{\text{left}}}

    by using the silly inequality {-\log(1-t) \ge t} for {t < 1}. So the contribution in total is at most

    \displaystyle \begin{aligned} \sum_{1 \le a \le \frac 14 x} \Lambda(n) \cdot (x/n)^c \cdot \frac{1}{T \cdot \frac{a}{x_{\text{left}}}} &\le \frac{x_{\text{left}}}{T} \sum_{1 \le a \le \frac 14 x} \Lambda(n) \cdot \left( \frac 43 \right)^2 \frac 1a \\ &\le \frac{16}{9} \frac{x_{\text{left}}}{T} \log x \sum_{1 \le a \le \frac 14 x} \frac 1a \\ &\le \frac{16}{9} \frac{(x-1) (\log x)(\log \frac 14 x + 2)}{T} \\ &\le \frac{1.9x (\log x)^2}{T} \end{aligned}

    provided {x \ge 7391}.


  • If {n = x_{\text{left}}}, we have

    \displaystyle \log \frac xn = -\log\left( 1 - \frac{x-x_{\text{left}}}{x} \right) > \frac{d}{x}

    Hence in this case, we get an error at most

    \displaystyle \begin{aligned} \Lambda(x_{\text{left}}) \left( \frac{x}{x_{\text{left}}} \right)^c \min \left\{ 1, \frac{x}{Td} \right\} &< \Lambda(x_{\text{left}}) \left( \frac 43 \right)^2 \min \left\{ 1, \frac{x}{Td} \right\} \\ &\le \frac{16}{9} \log x \min \left\{ 1, \frac{x}{Td} \right\}. \end{aligned}


  • The cases {n = x_{\text{right}}} and {x_{\text{right}} + 1 \le n < \frac 54x} give the same bounds as above, in the same way.

Finally, if for {x} outside the interval mentioned above, we in fact have {\left\lvert \log x/n \right\rvert > \frac{1}{5}}, say, and so all terms contribute at most

\displaystyle \begin{aligned} \sum_n \Lambda(n) \cdot (x/n)^c \cdot \frac{1}{T \log \left\lvert x/n \right\rvert} &\le \frac{5x^c}{T} \sum_n \Lambda(n) \cdot n^{-c} \\ &= \frac{5ex}{T} \cdot \left\lvert -\frac{\zeta'}{\zeta} (c) \right\rvert \\ &< \frac{5ex}{T} \cdot \left( \frac{1}{c-1} + 0.5773 \right) \\ &\le \frac{14x \log x}{T}. \end{aligned}

(Recall {\zeta'/\zeta} had a simple pole at {s=1}, so near {s=1} it behaves like {\frac{1}{s-1}}.)

The sum of everything is {\le \frac{3.8x(\log x)^2+14x\log x}{T} + \frac{32}{9} \log x \min \left\{ 1, \frac{x}{Td} \right\}}. Hence, the grand total across all these terms is the horrible

\displaystyle \boxed{ E_{\text{truncate}} \le \frac{5x(\log x)^2}{T} + 3.6\log x \min \left\{ 1, \frac{x}{Td} \right\}}

provided {x \ge 1.2 \cdot 10^5}.

6.2. Estimating the Contour Error

We now need to measure the error along the contour, taken from {U \rightarrow \infty}. Throughout assume {U \ge 3}. Naturally, to estimate the integral, we seek good estimates on

\displaystyle \left\lvert \frac{L'}{L}(\sigma) \right\rvert.

For this we appeal to the Hadamard expansion. We break into a couple cases.

  • First, let’s look at the integral when {-1 \le \sigma \le 2}, so {s = \sigma \pm iT} with {T} large. We bound the horizontal integral along these regions; by symmetry let’s consider just the top

    \displaystyle \int_{-1+iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds.

    Thus we want an estimate of {-\frac{L'}{L}}.

    Lemma 13

    Let {s} be such that {-1 \le \sigma \le 2}, {\left\lvert t \right\rvert \ge 2}. Assume {\chi} is primitive (possibly trivial), and that {t} is not within {c_0\mathcal L^{-1}} of any zeros of {L(s, \chi)}. Then

    \displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = O(\mathcal L^2)

    Proof: Since we assumed that {T \ge 2} we need not worry about {\frac{\delta(\chi)}{s-1}} and so we obtain

    \displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = -\frac{1}{2} \log\frac{q}{\pi} - \frac{1}{2}\frac{\Gamma'(\frac{1}{2} s + \frac{1}{2} a)}{\Gamma(\frac{1}{2} s + \frac{1}{2} a)} + B(\chi) + \sum_{\rho} \left( \frac{1}{s-\rho} + \frac{1}{\rho} \right).

    and we eliminate {B(\chi)} by computing

    \displaystyle \frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} - \frac{L'(2+it, \chi)}{L(2+it, \chi)} = E_{\text{gamma}} + \sum_{\rho} \left( \frac{1}{\sigma+it-\rho} - \frac{1}{2+it-\rho} \right).


    \displaystyle E_{\text{gamma}} = \frac{1}{2}\frac{\Gamma'(\frac{1}{2} (2+it) + \frac{1}{2} a)}{\Gamma(\frac{1}{2} (2+it) + \frac{1}{2} a)} - \frac{1}{2}\frac{\Gamma'(\frac{1}{2} (\sigma+it) + \frac{1}{2} a)}{\Gamma(\frac{1}{2} (\sigma+it) + \frac{1}{2} a)} \ll \log T

    by Stirling (here we use the fact that {-1 \le \sigma \le 2}). For the terms where {\gamma \notin [t-1, t+1]} we see that

    \displaystyle \begin{aligned} \left\lvert \frac{1}{\sigma+it-\rho} - \frac{1}{2+it-\rho} \right\rvert &= \frac{2-\sigma}{\left\lvert \sigma+it-\rho \right\rvert \left\lvert 2+it-\rho \right\rvert} \\ &\le \frac{2-\sigma}{\left\lvert \gamma-t \right\rvert^2} \le \frac{3}{\left\lvert \gamma-t \right\rvert^2} \\ &\le \frac{6}{\left\lvert \gamma-t \right\rvert^2+1}. \end{aligned}

    So the contribution of the sum for {\left\lvert \gamma-t \right\rvert \ge 1} can be bounded by {O(\mathcal L)}, via the vertical sum lemma.

    As for the zeros with smaller imaginary part, we at least have {\left\lvert 2+it-\rho \right\rvert = \left\lvert 2-\beta \right\rvert > 1} and thus we can reduce the sum to just

    \displaystyle \frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} - \frac{L'(2+it, \chi)}{L(2+it, \chi)} = \sum_{\gamma\in[t-1,t+1]} \frac{1}{\sigma+it-\rho} + O(\mathcal L).

    Now by the assumption that {\left\lvert \gamma-t \right\rvert \ge c\mathcal L^{-1}}; so the terms of the sum are all at most {O(\mathcal L)}. Also, there are {O(\mathcal L)} zeros with imaginary part in that range. Finally, we recall that {\frac{L'(2+it, \chi)}{L(2+it, \chi)}} is bounded; we can write it using its (convergent) Dirichlet series and then note it is at most {\frac{\zeta'(2+it)}{\zeta(2+it)} \le \frac{\zeta'(2)}{\zeta(2)}}. \Box
    At this point, we perturb {T} as described in vertical distribution so that the lemma applies, and use can then compute

    \displaystyle \begin{aligned} \left\lvert \int_{-1+iT}^{c+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \right\rvert &< O(\mathcal L^2) \cdot \int_{-1+iT}^{c+iT} \left\lvert \frac{x^s}{s} \right\rvert \; ds \\ &< O(\mathcal L^2) \int_{-1}^c \frac{x^\sigma}{2T} \; d\sigma \\ &< O(\mathcal L^2) \cdot \frac{x^{c+1}-1}{T \log x} \\ &< O\left(\frac{\mathcal L^2 x}{T \log x}\right). \end{aligned}


  • Next, for the integral {-U \le \sigma \le 1}, we use the “far-left” estimate to obtain

    \displaystyle \begin{aligned} \left\lvert \int_{-U+iT}^{-1+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \right\rvert &\ll \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{s} \right\rvert \cdot \log q \left\lvert s \right\rvert \; ds \\ &\ll \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{s} \right\rvert \cdot \log q \left\lvert s \right\rvert \; ds \\ &\ll \log q \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{s} \right\rvert \; ds + \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s \log \left\lvert s \right\rvert}{s} \right\rvert \; ds \\ &< \log q \int_{-\infty+iT}^{-1+iT} \left\lvert \frac{x^s}{T} \right\rvert \; ds + \int_{-\infty}^{-1} \left\lvert \frac{x^s \log T}{T} \right\rvert \; ds \\ &\ll \frac{\log q}{T} \int_{-\infty}^{-1} x^\sigma \; d\sigma + \frac{\log T}{T} \int_{-\infty}^{-1} x^\sigma \; d\sigma \\ &< \frac{\mathcal L}{T} \left( \frac{x^{-1}}{\log x} \right) = \frac{\mathcal L}{T x \log x}. \end{aligned}

    So the contribution in this case is {O\left( \frac{\mathcal L}{T x \log x} \right)}.


  • Along the horizontal integral, we can use the same bound

    \displaystyle \begin{aligned} \left\lvert \int_{-U-iT}^{-U+iT} -\frac{L'}{L}(s, \chi) \frac{x^s}{s} \; ds \right\rvert &\ll \int_{-U-iT}^{-U+iT} \left\lvert \frac{x^s}{s} \right\rvert \cdot \log q \left\lvert s \right\rvert \; ds \\ &= x^{-U} \cdot \int_{-U-iT}^{-U+iT} \frac{\log q \left\lvert s \right\rvert}{\left\lvert s \right\rvert} \; ds \\ &= x^{-U} \cdot \int_{-U-iT}^{-U+iT} \frac{\log q + \log U}{U} \; ds \\ &= \frac{2T(\log q + \log U)}{Ux^U} \end{aligned}

    which vanishes as {U \rightarrow \infty}.

So we only have two error terms, {O\left( \frac{\mathcal L^2 x}{T \log x} \right)} and {O\left( \frac{\mathcal L}{Tx\log x} \right)}. The first is clearly larger, so we end with

\displaystyle \boxed{E_{\text{contour}} \ll \frac{\mathcal L^2x}{T \log x}}.

6.3. The term {b(\chi)}

We can estimate {b(\chi)} as follows:

Lemma 14

For primitive {\chi}. we have

\displaystyle b(\chi) = O(\log q) - \sum_{\left\lvert \gamma \right\rvert < 1} \frac{1}{\rho}

Proof: The idea is to look at {\frac{L'}{L}(s,\chi)-\frac{L'}{L}(2,\chi)}. By subtraction, we obtain

\displaystyle \begin{aligned} \frac{L'}{L}(s, \chi) -\frac{L'}{L}(2, \chi) &= - \frac{\Gamma'}{\Gamma} \left( \frac{s+a}{2} \right) + \frac{\Gamma'}{\Gamma} \left( \frac{2+a}{2} \right) \\ &- \frac rs - \frac r{s-1} + \frac r2 + \frac r1 \\ &+ \sum_\rho \left( \frac{1}{s-\rho} - \frac{1}{2-\rho} \right) \end{aligned}

Then at {s=0} (eliminating the poles), we have

\displaystyle \frac{L'}{L}(s, \chi) = O(1) - \sum_{\rho} \left( \frac{1}{\rho}+\frac{1}{2-\rho} \right)

where the {O(1)} is {\frac{L'}{L}(2,\chi) + \frac r2 + \gamma + \frac{\Gamma'}{\Gamma}(1)} if {a=0} and {\frac{L'}{L}(2,\chi) + \frac r2 - \frac{\Gamma'}{\Gamma}(\frac{1}{2}) + \frac{\Gamma'}{\Gamma}(\frac32)} for {a=1}. Furthermore,

\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert > 1} \left( \frac{1}{\rho}+\frac{1}{2-\rho} \right) \le \sum_{\rho, \left\lvert \gamma \right\rvert > 1} \frac{2}{\left\lvert \rho(2-\rho) \right\rvert} < 2 \sum_{\rho, \left\lvert \gamma \right\rvert > 1} \frac{1}{\left\lvert 2-\rho \right\rvert^2}

which is {O(\log q)} by our vertical distribution results, and similarly

\displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert < 1} \frac{1}{2-\rho} = O(\log q).

This completes the proof. \Box

Let {\beta_1} be a Siegel zero, if any; for all the other zeros, we have that {\left\lvert \frac{1}{\rho} \right\rvert = \frac{1}{\beta^2+\gamma^2}}. We now have two cases.

  • {\overline{\chi} \neq \chi}. Then {\overline{\chi}} is complex and thus has no exceptional zeros; hence each of its zeros has {\beta < 1 - \frac{c}{\log q}}; since {\overline{\rho}} is a zero of {\overline{\chi}} if and only if {1-\rho} is a zero of {\chi}, it follows that all zeros of {\chi} are have {\left\lvert \frac{1}{\rho} \right\rvert < O(\log q)}. Moreover, in the range {\gamma \in [-1,1]} there are {O(\log q)} zeros (putting {T=0} in our earlier lemma on vertical distribution).

Thus, total contribution of the sum is {O\left( (\log q)^2 \right)}.


  • If {\overline{\chi} = \chi}, then {\chi} is real. The above argument goes through, except that we may have an extra Siegel zero at {\beta_S}; hence there will also be a special zero at {1 - \beta_S}. We pull these terms out separately.Consequently,

    \displaystyle \boxed{b(\chi) = O\left( (\log q)^2 \right) - \frac{1}{\beta_S} - \frac{1}{1-\beta_S}}.

    By adjusting the constant, we may assume {\beta_S > \frac{2014}{2015}} if it exists.

    7. Computing {\psi(x,\chi)} and {\psi(x;q,a)}

    7.1. Summing the Error Terms

    We now have, for any {T \ge 2}, {x \ge 6}, and {\chi} modulo {q} possibly primitive or trivial, the equality

    \displaystyle \psi(x, \chi) = \delta(\chi) x + E_{\text{contour}} + E_{\text{truncate}} + E_{\text{tiny}} - b(\chi) - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho}.


    \displaystyle \begin{aligned} E_{\text{contour}} &\ll \frac{x(\log x)^2}{T} + \log x \min \left\{ 1, \frac{x}{Td} \right\} \\ E_{\text{truncate}} &\ll \frac{\mathcal L^2 x}{T \log x} \\ E_{\text{tiny}} &\ll \log x \log q \\ b(\chi) &= O\left( (\log q)^2 \right) - \frac{1}{\beta_S} - \frac{1}{1-\beta_S}. \end{aligned}

    Assume now that {T \le x}, and {x} is an integer (hence {d \ge 1}). Then aggregating all the errors gives

    \displaystyle \psi(x, \chi) = \delta(\chi) x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} - \frac{x^{\beta_S}-1}{\beta_S} - \frac{x^{1-\beta_S}-1}{1-\beta_S} + O\left( \frac{x (\log qx)^2}{T} \right).

    where the sum over {\rho} now excludes the Siegel zero. We can omit the terms {\beta_S^{-1} < \frac{2015}{2014} = O(1)}, and also

    \displaystyle \frac{x^{1-\beta_S}-1}{1-\beta_S} < x^{\frac{1}{2015}} \log x.

    Absorbing things into the error term,

    \displaystyle \psi(x, \chi) = \delta(\chi) x - \frac{x^{\beta_S}}{\beta_S} - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} + O\left( \frac{x (\log qx)^2}{T} + x^{\frac{1}{2015}} \log x \right).

    7.2. Estimating the Sum Over Zeros

    Now we want to estimate

    \displaystyle \sum_{\rho, \left\lvert \gamma \right\rvert < T} \left\lvert \frac{x^\rho}{\rho} \right\rvert.

    We do this is the dumbest way possible: putting a bound on {x^\rho} and pulling it out.

    For any non-Siegel zero, we have a zero-free region {\beta < 1 - \frac{c_1}{\mathcal L}}, whence

    \displaystyle \left\lvert x^\rho \right\rvert < x^{\beta} = x \cdot x^{\beta-1} = x \exp\left( \frac{-c_1 \log x}{\mathcal L} \right).

    Pulling this out, we can then estimate the reciprocals by using our differential:

    \displaystyle \begin{aligned} \sum_{\rho, \left\lvert \gamma \right\rvert < T} \left\lvert \frac{1}{\rho} \right\rvert < \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{1}{\gamma} < \sum_{t=1}^T \frac{\log q(t+2)}{t} \ll (\log qT)^2 \le (\log qx)^2. \end{aligned}


    \displaystyle \psi(x, \chi) = \delta(\chi) x - \frac{x^{\beta_S}}{\beta_S} + O\left( \frac{x (\log qx)^2}{T} + x^{\frac{1}{2015}} \log x + (\log qx)^2 \cdot x \exp\left( \frac{-c_1 \log x}{\mathcal L} \right) \right).

    We select

    \displaystyle T = \exp\left(c_3 \sqrt{\log x}\right)

    for some constant {c_3}, and moreover assume {q \le T}, then we obtain

    \displaystyle \psi(x, \chi) = \delta(\chi) x - \frac{x^{\beta_S}}{\beta_S} + O\left( x \exp\left( -c_4 \sqrt{\log x} \right) \right).

    7.3. Summing Up

    We would like to sum over all characters {\chi}. However, we’re worried that there might be lots of Siegel zeros across characters. A result of Landau tells us this is not the case:

    Theorem 15 (Landau)

    If {\chi_1} and {\chi_2} are real nontrivial primitive characters modulo {q_1} and {q_2}, then for any zeros {\beta_1} and {\beta_2} we have

    \displaystyle \min \left\{ \beta_1, \beta_2 \right\} < 1 - \frac{c_5}{\log q_1q_2}

    for some fixed absolute {c_5}. In particular, for any fixed {q}, there is at most one {\chi \mod q} with a Siegel zero.

    Proof: The character {\chi_1\chi_2} is not trivial, so we can put

    \displaystyle \begin{aligned} -\frac{\zeta'}{\zeta}(\sigma) &= \frac{1}{\sigma-1} + O(1) \\ -\frac{L'}{L}(\sigma, \chi_1\chi_2) &= O(\log q_1q_2) \\ -\frac{L'}{L}(\sigma, \chi_1) &= O(\log q_1) - \frac{1}{\sigma-\beta_i} \\ -\frac{L'}{L}(\sigma, \chi_2) &= O(\log q_2) - \frac{2}{\sigma-\beta_i}. \end{aligned}

    Now we use a silly trick:

    \displaystyle 0 \le -\frac{\zeta'}{\zeta}(\sigma) -\frac{L'}{L}(\sigma, \chi_1) -\frac{L'}{L}(\sigma, \chi_2) -\frac{L'}{L}(\sigma, \chi_1\chi_2)

    by “Simon’s Favorite Factoring Trick” (we use the deep fact that {(1+\chi_1)(1+\chi_2) \ge 0}, the analog of {3-4-1}). The upper bounds give now

    \displaystyle \frac{1}{\sigma-\beta_1} + \frac{1}{\sigma-\beta_2} < \frac{1}{\sigma-1} + O(\log q_1 \log q_2).

    and one may deduce the conclusion from here. \Box

    We now sum over all characters {\chi} as before to obtain

    \displaystyle \psi(x; q, a) = \frac{1}{\phi(q)} x - \frac{\chi_S(a)}{\phi(q)} \frac{x^{\beta_S}}{\beta_S} + O \left( x \exp\left(-c_6 \sqrt{\log x}\right) \right)

    where {\chi_S = \overline{\chi}_S} is the character with a Siegel zero, if it exists.

    8. Siegel’s Theorem, and Finishing Off

    The term with {x^{\beta_S} / \beta_S} is bad, and we need some way to get rid of it. We now appeal to Siegel’s Theorem:

    Theorem 16 (Siegel’s Theorem)

    For any {\varepsilon > 0} there is a {C_1(\varepsilon) > 0} such that any Siegel zero {\beta_S} satisfies

    \displaystyle \beta_S < 1-C_1(\varepsilon) q^{-\varepsilon}.

    Thus for a positive constant {N}, assuming {q \le (\log x)^N}, letting {\varepsilon = (2N)^{-1}} means {q^{-\varepsilon} > \frac{1}{\sqrt{\log x}}}, so we obtain

    \displaystyle x^{\beta_S} < x \exp\left( -C_1(\varepsilon) \log x q^{-\varepsilon} \right) < x \exp\left( -C_1(\varepsilon) \sqrt{\log x} \right).


    \displaystyle \begin{aligned} \psi(x; q, a) &= \frac{1}{\phi(q)} x - \frac{\chi_S(a)}{\phi(q)} \frac{x^{\beta_S}}{\beta_S} + O \left( x \exp\left(-c_6 \sqrt{\log x}\right) \right) \\ &\le \frac{1}{\phi(q)} x + O\left( x \exp\left( -C_1(\varepsilon) \sqrt{\log x} \right) \right) + O \left( x \exp\left(-c_6 \sqrt{\log x}\right) \right) \\ &\le \frac{1}{\phi(q)} x + O\left( x \exp\left( -C(N) \sqrt{\log x} \right) \right) \end{aligned}

    where {C(N) = \min \left\{ C_1(\varepsilon), c_6 \right\}}. This completes the proof of Dirichlet’s Theorem.


Zeros and Primes

Prerequisites for this post: previous post, and complex analysis. For this entire post, {s} is a complex variable with {s = \sigma + it}.

1. The {\Gamma} function

So there’s this thing called the Gamma function. Denoted {\Gamma(s)}, it is defined by

\displaystyle  \Gamma(s) = \int_0^{\infty} x^{s-1} e^{-x} \; dx

as long as {\sigma > 0}. Here are its values at the first few integers:

\displaystyle  \begin{aligned} \Gamma(1) &= 1 \\ \Gamma(2) &= 1 \\ \Gamma(3) &= 2 \\ \Gamma(4) &= 6 \\ \Gamma(5) &= 24. \end{aligned}

Yep: the {\Gamma} function is contrived so that the identity

\displaystyle  \Gamma(s+1) = s\Gamma(s)

always holds. (Proof: integration by parts.) Thus {\Gamma} interpolates the factorial function a nice way. Moreover, this identity lets us extend {\Gamma} to a meromorphic function on the entire complex plane.

We like the {\Gamma} function because, unlike {\zeta}, we know what it’s doing. For example we actually know how to compute {\Gamma(n)} for any positive integer {n}. (Contrast this with, say, {\zeta(3)}; we don’t even know if it’s an algebraic number, and it took until 1978 to prove that it was irrational). More to the point, we also know where all its zeros and poles are:

Proposition 1

Let {\Gamma} be defined as above, extended to a meromorphic function on all of {\mathbb C}.

  • The function {\Gamma} has no zeros.
  • It has a simple pole at {s= 0, -1, -2, \dots}, and these are the only poles.

The pole at {s=0} should not be surprising: plug in {s=0} to {\Gamma(s+1)=s\Gamma(s)}.

In any case, moral of story: {\Gamma} is very friendly!

2. Functional Equation for Zeta

We will now do something really magical, due to Riemann. Pick an integer {n}; we set {t = \pi n^2x} to derive the artificial equation

\displaystyle  \Gamma(s) = (n^2\pi)^{s} \int_0^{\infty} x^{s-1} e^{-n^2\pi x} \; dx.

Replacing {s} with {\frac{1}{2} s} and rearranging gives

\displaystyle  \pi^{-s/2} \cdot \Gamma(s/2) n^{-s} = \int_0^\infty x^{s/2 - 1} e^{-n^2\pi x} \; dx \qquad \sigma > 0.

Then, due to absolute convergence we can sum over {n \ge 1}; this brings the Riemann zeta function into the right-hand side, to get

\displaystyle  \pi^{-s/2} \cdot \Gamma(s/2) \zeta(s) = \int_0^\infty x^{s/2 - 1} \frac{\theta(x)-1}{2} \; dx \qquad \sigma > 1


\displaystyle  \theta(x) = \sum_{n=-\infty}^{\infty} e^{-n^2\pi x};

so that {\frac{\theta(x)-1}{2}} gives just the sum for {n \ge 1}. It turns out that this {\theta} is special: a Jacobi theta function, which happens to satisfy

\displaystyle  \theta(x^{-1}) = \sqrt{x} \cdot \theta(x) \quad \forall x > 0.

Also, {\omega(x) = O(e^{-\pi x})} for {x > 0}.

Using this and with some purely algebraic manipulations (namely, splitting the integral into two parts, {0 \le x \le 1} and {x \ge 1}, and then using the property of the theta function), one can derive that

\displaystyle  \pi^{-s/2}\Gamma(s/2)\zeta(s) = \frac1{s-1} - \frac 1s + \int_1^\infty \frac{\theta(x)-1}{2} \cdot \left( x^{s/2-1} + x^{(1-s)/2-1} \right) \; dx.

The right-hand side has two very useful properties:

  • It is even around {\frac{1}{2}}, meaning it remains the same when {s} is replaced by {1-s}.
  • The integral is actually an entire function on all of {\mathbb C} (the integral converges for all {s} because {\theta(x) = O(e^{-\pi x})}).

So if we multiply both sides by {\frac{1}{2} s(s-1)}, we get a symmetric entire function, called the {\xi} function:

Theorem 2

The function

\displaystyle  \xi(s) = \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s)


\displaystyle  \xi(s) = \xi(1-s)

and is an entire function on all of {\mathbb C}.

In particular we can use this to extend {\zeta} to a meromorphic function on the entire complex plane. We have

\displaystyle  \pi^{-\frac{1}{2} s} \Gamma\left( \frac s2 \right) \zeta(s) = \pi^{-\frac{1}{2} (1-s)} \Gamma\left( \frac{1-s}{2} \right) \zeta(1-s).

We can count zeros and poles from this. The {\pi}‘s don’t do anything. The {\Gamma}‘s have no zeros and just poles at non-positive integers.

Since {\zeta} has no zeros and no poles other than {s = 1} for {\sigma > 0}, the only zeros it will have are for {\sigma \le 0} are at {s = -2, -4, -6, \dots} in order to cancel out the poles of {\Gamma(s/2)}. And so these are the so-called trivial zeros.

On the other hand in the strip {0 < \sigma < 1} we get very confused. More on that later.

3. Explicit Formula for Chebyshev Function

Recall that last post we had

\displaystyle  \psi(x) = \frac{1}{2\pi i} \int_{c-i\infty}^{c+i\infty} -\frac{\zeta'(s)}{\zeta(s)} \cdot \frac{x^s}{s} \; ds

according to Perron’s Formula. Write this as

\displaystyle  \psi(x) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \left( \int_{c-iT}^{c+iT} \frac{\zeta'(s)}{\zeta(s)} \cdot -\frac{x^s}{s} \; ds \right)

Recall that if {f} is a meromorphic function then so is

\displaystyle  \frac{1}{2\pi i} \cdot \frac{f'}{f}

which has a simple pole for each zero/pole of {f}; the residue is {+1} for each simple zero (more generally {+m} for a zero of multiplicity {m}) and {-1} for each simple pole (more generally {-m} for a pole of order {m}). (See the argument principle).

Now we are going to do use the bestest theorem from complex analysis: Cauchy’s Residue Theorem. We replace the path {c-iT \rightarrow c+iT} with {c-iT \rightarrow -U-iT \rightarrow U + iT \rightarrow c+iT}. Doing so picks up the following residues:

  • Since {\zeta} has a pole at {s=1}, {\zeta'/\zeta} has a simple pole there of residue {-1}, so we pick up a residue of {+x} corresponding to

    \displaystyle  (-1) \cdot -x^1/1 = +x.

    This is the “main term”.

  • {\zeta} has simple zeros at {s=-2,-4,-6,\dots}. So we pick up residues of {-\frac{x^2}{2}}, {-\frac{x^4}{4}}, and so on, or

    \displaystyle  - \sum_{n \ge 1} \frac{x^{2n}}{2n} = -\frac{1}{2} \log(1-x^{-2}).

  • {x^s/s} itself has a pole at {s=0}, which gives us an additional term

    \displaystyle  -\frac{\zeta'(0)}{\zeta(0)}.

    It turns out that this equals {\log(2\pi)}, because why not.

  • Finally, the hard-to-understand zeros in the strip {0 < \sigma < 1}. If {\rho = \beta+i\gamma} is a zero, then it contributes a residue of {-\frac{x^\rho}{\rho}}. We only pick up the zeros with {\left\lvert \gamma \right\rvert < T} in our rectangle, so we get a term

    \displaystyle  \sum_{\substack{\rho, \left\lvert \gamma \right\rvert < T}} \frac{x^\rho}{\rho}

In what follows, {\rho} always denotes a zero in the critical strip, written {\rho = \beta + i \gamma}.

Putting this all together, we obtain that

\displaystyle  \psi(x) = \frac{1}{2\pi i} \lim_{U \rightarrow \infty} \lim_{T \rightarrow \infty} \left( x - \frac{1}{2} \log(1-x^{-2}) - \log(2\pi) - \sum_{\substack{\rho, \left\lvert \gamma \right\rvert < T}} \frac{x^\rho}{\rho} + I(x,U,T) \right)

where {I(x,U,T)} is the integral along the three sides of the rectangle. With some effort, you can show that in fact {I(x,U,T) \rightarrow 0}; but the speed of convergence depends on {x}. To avoid this, we can take {U \rightarrow \infty} but leave {T} intact, which gives the explicit formula:

Theorem 3 (Explicit Formula for the Chebyshev Function)

For {x \ge 2}

\displaystyle  \psi(x) = x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} - \log(2\pi) - \frac{1}{2}\log(1-x^{-2}) + R(x,T)

where {R(x,T)} is some error term, with the property that {R(x,T) \rightarrow 0} as {T \rightarrow \infty} for any fixed {x}.

(It’s necessary to use a truncated sum since the series {\sum_{\rho} \frac{1}{\left\lvert \rho \right\rvert}} actually converges only conditionally, and not absolutely.)

The error term is ugly enough that I haven’t included it in the formula, but if you want to know I’ll at least write it down: it is

\displaystyle  R(x,T) = O\left( \frac{x \log(xT)^2}{T} + \log x \min \left\{ 1, \frac{x}{T\left<x\right>} \right\} \right)

where {\left<x\right>} is the distance from {x} to the nearest prime power ({\ne x}). Clearly, for any fixed {x}, {R(x,T) \rightarrow 0} as {T \rightarrow \infty}. What I will say is that it’s not our main concern:

4. The Prime Number Theorem

Here’s the deal: we know {\psi(x) \approx x}, and want something as close as possible. We get the {x} right off the bat, so we want everything else to be small.

The term {-\frac{1}{2}\log(1-x^{-2})} is tiny, as is the constant. The {R(x,T)} can be handled as long as {T} is big enough: the price we pay is that we introduce more zeros into the sum over {\rho}. The sum of the zeros: well, what about the sum?

We know that for any zero {\rho = \beta + i \gamma}, we have {\beta < 1}. But if {\beta = 1 - \varepsilon}, we’d be very upset, because now our sum has a term of size {\left\lvert x^\rho \right\rvert = x^\beta = x^{1-\varepsilon}} in it, which is bad since the thing we’re shooting for is {\psi(x) \approx x}. So what we’d like is to be able to force {\beta} to be less than something, like {0.9}. Then we’d have an error term around {x^{0.9}}, which is not spectacular but better than {x^{1-\varepsilon}}.

In fact, we believe that {\beta = \frac12}, always — the Riemann Hypothesis. With some messing around with the value of {T}, we would then get an error term of

\displaystyle  O\left( x^{\frac12 + \varepsilon} \right)

which is pretty good, and actually near best possible (one can show that {O(\sqrt x)} is not achievable).

Unfortunately, we can not even show {\beta < 0.999}. The most we know is that {\beta < 1 - \frac{c}{\log \gamma}}, which gives some pretty crummy bounds compared to what we think is true: using this bound, the best we can do is

\displaystyle  \psi(x) = x + O\left( x \exp\left( -c\sqrt{\log x} \right) \right)

which is worse than {O(x^{0.999})}. That’s the current state of affairs.

That {\zeta} function is pretty mysterious.


References: Kedlaya’s 18.785 notes and Hildebrand’s ANT notes.