Zeros and Primes

Prerequisites for this post: previous post, and complex analysis. For this entire post, {s} is a complex variable with {s = \sigma + it}.

1. The {\Gamma} function

So there’s this thing called the Gamma function. Denoted {\Gamma(s)}, it is defined by

\displaystyle  \Gamma(s) = \int_0^{\infty} x^{s-1} e^{-x} \; dx

as long as {\sigma > 0}. Here are its values at the first few integers:

\displaystyle  \begin{aligned} \Gamma(1) &= 1 \\ \Gamma(2) &= 1 \\ \Gamma(3) &= 2 \\ \Gamma(4) &= 6 \\ \Gamma(5) &= 24. \end{aligned}

Yep: the {\Gamma} function is contrived so that the identity

\displaystyle  \Gamma(s+1) = s\Gamma(s)

always holds. (Proof: integration by parts.) Thus {\Gamma} interpolates the factorial function a nice way. Moreover, this identity lets us extend {\Gamma} to a meromorphic function on the entire complex plane.

We like the {\Gamma} function because, unlike {\zeta}, we know what it’s doing. For example we actually know how to compute {\Gamma(n)} for any positive integer {n}. (Contrast this with, say, {\zeta(3)}; we don’t even know if it’s an algebraic number, and it took until 1978 to prove that it was irrational). More to the point, we also know where all its zeros and poles are:

Proposition 1

Let {\Gamma} be defined as above, extended to a meromorphic function on all of {\mathbb C}.

  • The function {\Gamma} has no zeros.
  • It has a simple pole at {s= 0, -1, -2, \dots}, and these are the only poles.

The pole at {s=0} should not be surprising: plug in {s=0} to {\Gamma(s+1)=s\Gamma(s)}.

In any case, moral of story: {\Gamma} is very friendly!

2. Functional Equation for Zeta

We will now do something really magical, due to Riemann. Pick an integer {n}; we set {t = \pi n^2x} to derive the artificial equation

\displaystyle  \Gamma(s) = (n^2\pi)^{s} \int_0^{\infty} x^{s-1} e^{-n^2\pi x} \; dx.

Replacing {s} with {\frac{1}{2} s} and rearranging gives

\displaystyle  \pi^{-s/2} \cdot \Gamma(s/2) n^{-s} = \int_0^\infty x^{s/2 - 1} e^{-n^2\pi x} \; dx \qquad \sigma > 0.

Then, due to absolute convergence we can sum over {n \ge 1}; this brings the Riemann zeta function into the right-hand side, to get

\displaystyle  \pi^{-s/2} \cdot \Gamma(s/2) \zeta(s) = \int_0^\infty x^{s/2 - 1} \frac{\theta(x)-1}{2} \; dx \qquad \sigma > 1


\displaystyle  \theta(x) = \sum_{n=-\infty}^{\infty} e^{-n^2\pi x};

so that {\frac{\theta(x)-1}{2}} gives just the sum for {n \ge 1}. It turns out that this {\theta} is special: a Jacobi theta function, which happens to satisfy

\displaystyle  \theta(x^{-1}) = \sqrt{x} \cdot \theta(x) \quad \forall x > 0.

Also, {\omega(x) = O(e^{-\pi x})} for {x > 0}.

Using this and with some purely algebraic manipulations (namely, splitting the integral into two parts, {0 \le x \le 1} and {x \ge 1}, and then using the property of the theta function), one can derive that

\displaystyle  \pi^{-s/2}\Gamma(s/2)\zeta(s) = \frac1{s-1} - \frac 1s + \int_1^\infty \frac{\theta(x)-1}{2} \cdot \left( x^{s/2-1} + x^{(1-s)/2-1} \right) \; dx.

The right-hand side has two very useful properties:

  • It is even around {\frac{1}{2}}, meaning it remains the same when {s} is replaced by {1-s}.
  • The integral is actually an entire function on all of {\mathbb C} (the integral converges for all {s} because {\theta(x) = O(e^{-\pi x})}).

So if we multiply both sides by {\frac{1}{2} s(s-1)}, we get a symmetric entire function, called the {\xi} function:

Theorem 2

The function

\displaystyle  \xi(s) = \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s)


\displaystyle  \xi(s) = \xi(1-s)

and is an entire function on all of {\mathbb C}.

In particular we can use this to extend {\zeta} to a meromorphic function on the entire complex plane. We have

\displaystyle  \pi^{-\frac{1}{2} s} \Gamma\left( \frac s2 \right) \zeta(s) = \pi^{-\frac{1}{2} (1-s)} \Gamma\left( \frac{1-s}{2} \right) \zeta(1-s).

We can count zeros and poles from this. The {\pi}‘s don’t do anything. The {\Gamma}‘s have no zeros and just poles at non-positive integers.

Since {\zeta} has no zeros and no poles other than {s = 1} for {\sigma > 0}, the only zeros it will have are for {\sigma \le 0} are at {s = -2, -4, -6, \dots} in order to cancel out the poles of {\Gamma(s/2)}. And so these are the so-called trivial zeros.

On the other hand in the strip {0 < \sigma < 1} we get very confused. More on that later.

3. Explicit Formula for Chebyshev Function

Recall that last post we had

\displaystyle  \psi(x) = \frac{1}{2\pi i} \int_{c-i\infty}^{c+i\infty} -\frac{\zeta'(s)}{\zeta(s)} \cdot \frac{x^s}{s} \; ds

according to Perron’s Formula. Write this as

\displaystyle  \psi(x) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \left( \int_{c-iT}^{c+iT} \frac{\zeta'(s)}{\zeta(s)} \cdot -\frac{x^s}{s} \; ds \right)

Recall that if {f} is a meromorphic function then so is

\displaystyle  \frac{1}{2\pi i} \cdot \frac{f'}{f}

which has a simple pole for each zero/pole of {f}; the residue is {+1} for each simple zero (more generally {+m} for a zero of multiplicity {m}) and {-1} for each simple pole (more generally {-m} for a pole of order {m}). (See the argument principle).

Now we are going to do use the bestest theorem from complex analysis: Cauchy’s Residue Theorem. We replace the path {c-iT \rightarrow c+iT} with {c-iT \rightarrow -U-iT \rightarrow U + iT \rightarrow c+iT}. Doing so picks up the following residues:

  • Since {\zeta} has a pole at {s=1}, {\zeta'/\zeta} has a simple pole there of residue {-1}, so we pick up a residue of {+x} corresponding to

    \displaystyle  (-1) \cdot -x^1/1 = +x.

    This is the “main term”.

  • {\zeta} has simple zeros at {s=-2,-4,-6,\dots}. So we pick up residues of {-\frac{x^2}{2}}, {-\frac{x^4}{4}}, and so on, or

    \displaystyle  - \sum_{n \ge 1} \frac{x^{2n}}{2n} = -\frac{1}{2} \log(1-x^{-2}).

  • {x^s/s} itself has a pole at {s=0}, which gives us an additional term

    \displaystyle  -\frac{\zeta'(0)}{\zeta(0)}.

    It turns out that this equals {\log(2\pi)}, because why not.

  • Finally, the hard-to-understand zeros in the strip {0 < \sigma < 1}. If {\rho = \beta+i\gamma} is a zero, then it contributes a residue of {-\frac{x^\rho}{\rho}}. We only pick up the zeros with {\left\lvert \gamma \right\rvert < T} in our rectangle, so we get a term

    \displaystyle  \sum_{\substack{\rho, \left\lvert \gamma \right\rvert < T}} \frac{x^\rho}{\rho}

In what follows, {\rho} always denotes a zero in the critical strip, written {\rho = \beta + i \gamma}.

Putting this all together, we obtain that

\displaystyle  \psi(x) = \frac{1}{2\pi i} \lim_{U \rightarrow \infty} \lim_{T \rightarrow \infty} \left( x - \frac{1}{2} \log(1-x^{-2}) - \log(2\pi) - \sum_{\substack{\rho, \left\lvert \gamma \right\rvert < T}} \frac{x^\rho}{\rho} + I(x,U,T) \right)

where {I(x,U,T)} is the integral along the three sides of the rectangle. With some effort, you can show that in fact {I(x,U,T) \rightarrow 0}; but the speed of convergence depends on {x}. To avoid this, we can take {U \rightarrow \infty} but leave {T} intact, which gives the explicit formula:

Theorem 3 (Explicit Formula for the Chebyshev Function)

For {x \ge 2}

\displaystyle  \psi(x) = x - \sum_{\rho, \left\lvert \gamma \right\rvert < T} \frac{x^\rho}{\rho} - \log(2\pi) - \frac{1}{2}\log(1-x^{-2}) + R(x,T)

where {R(x,T)} is some error term, with the property that {R(x,T) \rightarrow 0} as {T \rightarrow \infty} for any fixed {x}.

(It’s necessary to use a truncated sum since the series {\sum_{\rho} \frac{1}{\left\lvert \rho \right\rvert}} actually converges only conditionally, and not absolutely.)

The error term is ugly enough that I haven’t included it in the formula, but if you want to know I’ll at least write it down: it is

\displaystyle  R(x,T) = O\left( \frac{x \log(xT)^2}{T} + \log x \min \left\{ 1, \frac{x}{T\left<x\right>} \right\} \right)

where {\left<x\right>} is the distance from {x} to the nearest prime power ({\ne x}). Clearly, for any fixed {x}, {R(x,T) \rightarrow 0} as {T \rightarrow \infty}. What I will say is that it’s not our main concern:

4. The Prime Number Theorem

Here’s the deal: we know {\psi(x) \approx x}, and want something as close as possible. We get the {x} right off the bat, so we want everything else to be small.

The term {-\frac{1}{2}\log(1-x^{-2})} is tiny, as is the constant. The {R(x,T)} can be handled as long as {T} is big enough: the price we pay is that we introduce more zeros into the sum over {\rho}. The sum of the zeros: well, what about the sum?

We know that for any zero {\rho = \beta + i \gamma}, we have {\beta < 1}. But if {\beta = 1 - \varepsilon}, we’d be very upset, because now our sum has a term of size {\left\lvert x^\rho \right\rvert = x^\beta = x^{1-\varepsilon}} in it, which is bad since the thing we’re shooting for is {\psi(x) \approx x}. So what we’d like is to be able to force {\beta} to be less than something, like {0.9}. Then we’d have an error term around {x^{0.9}}, which is not spectacular but better than {x^{1-\varepsilon}}.

In fact, we believe that {\beta = \frac12}, always — the Riemann Hypothesis. With some messing around with the value of {T}, we would then get an error term of

\displaystyle  O\left( x^{\frac12 + \varepsilon} \right)

which is pretty good, and actually near best possible (one can show that {O(\sqrt x)} is not achievable).

Unfortunately, we can not even show {\beta < 0.999}. The most we know is that {\beta < 1 - \frac{c}{\log \gamma}}, which gives some pretty crummy bounds compared to what we think is true: using this bound, the best we can do is

\displaystyle  \psi(x) = x + O\left( x \exp\left( -c\sqrt{\log x} \right) \right)

which is worse than {O(x^{0.999})}. That’s the current state of affairs.

That {\zeta} function is pretty mysterious.


References: Kedlaya’s 18.785 notes and Hildebrand’s ANT notes.

von Mangoldt and Zeta

Prerequisites for this post: definition of Dirichlet convolution, and big {O}-notation.

Normally I don’t like to blog about something until I’m pretty confident that I have a reasonably good understanding of what’s happening, but I desperately need to sort out my thoughts, so here I go\dots

1. Primes

One day, an alien explorer lands on Earth in a 3rd grade classroom. He hears the teacher talk about these things called primes. So he goes up to the teacher and asks “how many primes are there less than {x}?”.

Answer: “uh. . .”.

Maybe that’s too hard, so the alien instead asks “about how many primes are there less than {x}?”.

This is again greeted with silence. Confused, the alien asks a bunch of the teachers, who all respond similarly, but then someone mentions that in the last couple hundred years, someone was able to show with a lot of effort that the answer was pretty close to

\displaystyle  \approx \frac{x}{\log x}.

The alien, now more satisfied, then says “okay, great! How good is this estimate?”

More silence.

2. The von Mangoldt function

The prime counting function isn’t very nice, but there is a related function that’s a lot more well-behaved. We define the von Mangoldt function {\Lambda} by

\displaystyle  \Lambda(x) = \begin{cases} \log p & x = p^k \text{ for some prime } p \\ 0 & \text{otherwise}. \end{cases}

It’s worth remarking that in terms of (Dirichlet) convolution, we have

\displaystyle  \mathbf 1 \ast \Lambda = \log

(here {\mathbf 1} is the constant function that gives {\mathbf 1(n) = 1}). (Do you see why?) Then, we define the second Chebyshev function as

\displaystyle  \psi(x) = \sum_{n \le x} \Lambda(n).

In words, {\psi(x)} adds up logs of prime powers; in still other words, it is the partial sums of {\Lambda}.

It turns out that knowing {\psi(x)} well gives us information about the number of primes less than {x}, and vice versa. (This is actually not hard to show; try it yourself if you like.) But we like the function {\psi} because it is more well-behaved. In particular, it turns out the answer to the alien’s question “there are about {\frac{x}{\log x}} primes less than {x}” is equivalent to “{\psi(x) \approx x}”.

So to satisfy the alien, we have to establish {\psi(x) \approx x} and tell him how good this estimate is.

Actually, what we believe to be true is:

Theorem 1 (Riemann Hypothesis)

We conjecture that

\displaystyle  \psi(x) = x + O\left( x^{\frac{1}{2}+\varepsilon} \right)

for any {\varepsilon > 0}.

Unfortunately, what we actually know is far from this:

Theorem 2 (Prime Number Theorem, Classical Form)

We have proved that

\displaystyle  \psi(x) = x + O\left( x e^{-c\sqrt{\log x}} \right)

for some constant {c} (actually we have done slightly better, but not much).

You will notice that this error term is greater than {O(x^{0.999})}, and this is true even of the more modern estimates. In other words, we have a long way to go.

3. Dirichlet Series and Perron’s Formula

Note: I’m ignoring issues of convergence in this section, and will continue to do so for this post.

First, some vocabulary. An arithmetic function is just a function {\mathbb N \rightarrow \mathbb C}.

Example 3

Functions {\Lambda}, {\mathbf 1}, or {\log} are arithmetic functions.

The partial sums of an arithmetic function are sums like {f(1) + f(2) + \dots + f(n)}, or better yet {\sum_{n \le x} f(n)}.

Example 4

The Chebyshev function {\psi} gives the partial sums of {\Lambda}, by definition.

Example 5

The floor function {\left\lfloor x \right\rfloor} gives the partial sums of {\mathbf 1}:

\displaystyle  \left\lfloor x \right\rfloor = \sum_{n \le x} 1 = \sum_{n \le x} \mathbf 1(n).

Back to the main point. We are scared of the word “prime”, so in estimating {\psi(x)} we want to avoid doing so by any means possible. In light of this we introduce the Dirichlet series for an arithmetic function {f}, which is defined as

\displaystyle  F(s) = \sum_{n \ge 1} f(n) n^{-s}

for complex numbers {s}. This is like a generating function, except rather than {x^n}‘s we have {n^{-s}}‘s.

Why Dirichlet series over generating functions? There are two reasons why this turns out to be a really good move. The first is that in number theory, we often have convolutions, which play well with Dirichlet series:

Theorem 6 (Convolution of Dirichlet Series)

Let {f, g, h : \mathbb N \rightarrow \mathbb C} be arithmetic functions and let {F}, {G}, {H} be the corresponding Dirichlet series. Then

\displaystyle  f = g \ast h \implies F = G \cdot H.

(Here {\ast} is the Dirichlet convolution.)

This is actually immediate if you just multiply it out!

We want to use this to get a handle on the Dirichlet series for {\Lambda}. As remarked earlier, we have

\displaystyle  \log = \mathbf 1 \ast \Lambda.

The Dirichlet series of {\mathbf 1} has a name; it is the infamous Riemann zeta function, given by

\displaystyle  \zeta(s) = \sum_{n \ge 1} n^{-s}.

What about {\log}? Answer: it’s just {-\zeta'(s)}! This follows by term-wise differentiation of the sum {\zeta}, since the derivative of {n^{-s}} is {-\log s \cdot n^{-s}}.

Thus we have deduced

Theorem 7 (Dirichlet Series of von Mangoldt)

We have

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n \ge 1} \Lambda(n) \cdot n^{-s}.

That was fun. Why do we care, though?

I promised a second reason, and here it is: Surprisingly, complex analysis gives us a way to link the Dirichlet series of a function with its partial sums (in this case, {\psi}). It is the so-called \beginPerron’s Formula}, which links partial sums to Dirichlet series:

Theorem 8 (Perron’s Formula)

Let {f : \mathbb N \rightarrow \mathbb C} be a function, {F} its Dirichlet series. Then for any {x} not an integer,

\displaystyle  \sum_{n \le x} f(n) = \frac{1}{2\pi i} \int_{c-i\infty}^{c+i\infty} F(s) \cdot \frac{x^s}{s} \; ds

for any large enough {c} (large enough to avoid convergence issues).

Applied here this tells us that if {x} is not an integer we have

\displaystyle  \psi(x) \overset{\text{def}}{=} \sum_{n \le x} \Lambda(n) = \frac{1}{2\pi i} \int_{c-i\infty}^{c+i\infty} -\frac{\zeta'(s)}{\zeta(s)} \cdot \frac{x^s}{s} \; ds.

for any {c > 1}.

This is fantastic, because we’ve managed to get rid of the sigma sign and the word “prime” from the entire problem: all we have to do is study the integral on the right-hand side. Right?

Ha, if it were that easy. That {\zeta} function is a strange beast.

4. The Riemann Zeta Function

Here’s the initial definition:

\displaystyle  \zeta(s) = \sum_{n \ge 1} n^{-s}

is the Dirichlet series of the constant function {\mathbf 1}. Unfortunately, this sum only converges when the real part of {s} is greater than {1}. (For {s=1}, it is the standard harmonic series, which explodes.)

However, we can use something called \beginAbel summation}, which transforms a Dirichlet series into an integral of its partial sums.

Theorem 9 (Abel Summation for Dirichlet Series)

If {f} is an arithmetic function and {F} is its Dirichlet series then

\displaystyle  F(s) = s \int_1^{\infty} \frac{\sum_{n \le x} f(n)}{x^{s+1}} \; dx.

It’s the opposite of Perron’s Formula earlier, which we used to transform partial sums into integrals in terms of the Dirichlet series. Unlike {\Lambda}, whose partial sums became the very beast {\psi} we were trying to tame, the partial sums of {\mathbf 1} are very easy to understand:

\displaystyle  \sum_{n \le x} \mathbf 1(n) = \sum_{n \le x} 1 = \left\lfloor x \right\rfloor.

It’s about as nice as can be!

Applying this to the Riemann zeta function and doing some calculation, we find that

\displaystyle  \zeta(s) = \frac{s}{s-1} - s \int_1^\infty \frac{ \left\{ x \right\} }{x^{s+1}} \; ds

where {\left\{ x \right\}} is the fractional part. It turns out that other than the explosion at {s=1}, this function will converge for any {s} whose real part is {> 0}. So this extends the Riemann zeta function to a function on half of the complex plane, minus a point (i.e. is a meromorphic function with a single pole at {s=1}).

5. Zeros of the Zeta Function

Right now I’ve only told you how to define {\zeta(s)} for {\mathrm{Re}\; s > 0}. In the next post I’ll outline how to push this even further to get the rest of the zeta function.

You might already be aware that the behavior of {\zeta} for {0 < \mathrm{Re}\; s < 1} has a large prize attached to it. For now, I’ll mention that

Theorem 10

If {\mathrm{Re}\; s \ge 1}, then {\zeta(s) \neq 0}.

Proof: Let {s = \sigma + it} be the real/imaginary parts (these letters are due to tradition). For {\sigma > 1}, we use the fact that we have an infinite product

\displaystyle  \zeta(s) = \prod_{p \text{ prime}} \left( 1+p^s+(p^2)^s + \dots \right) = \prod_{p \text{ prime}} \frac{1}{1-p^{-s}}.

Using the fact that {\sum_p \left\lvert p^{-s} \right\rvert < \sum_p p^{-\sigma} < \sum_{n \ge 1} n^{-\sigma} < \infty}, {\prod_p (1-p^{-s})}, converges to some finite value, say {L}. By standard facts on infinite products (for example, Appendix A.2 here) that means {\zeta(s)} is {1/L \neq 0}.

The situation for {\sigma = 1} is trickier. We use the following trick:

\displaystyle  3 + 4\cos\theta + \cos2\theta = 2(\cos\theta+1)^2 \ge 0 \quad \forall \theta.


\displaystyle  \log\zeta(s) = \sum_p -\log(1-p^{-s}) = \sum_p \sum_{n \ge 1} \frac{p^{-sn}}{n}

for all {\sigma > 1}. By looking term-by-term at the real parts and using the 3-4-1 inequality we obtain

\displaystyle  3\,\mathrm{Re}\, \log\zeta(\sigma) + 4\,\mathrm{Re}\, \log\zeta(\sigma+it) + \mathrm{Re}\, \log\zeta(\sigma+2it) \ge 0 \qquad \sigma > 1.


\displaystyle  \left\lvert \zeta(\sigma)^3\zeta(\sigma+it)^4\zeta(\sigma+2it) \right\rvert \ge 1.

Now suppose {1+it} was a zero ({t \neq 0}); let {\sigma \rightarrow 1^+}. Then we get a simple pole at {\zeta(1)}, repeated three times. However, we get a zero at {\zeta(1+it)}, repeated four times. There is no pole at {\zeta(1+2it)}, so the left-hand side is going to drop to zero, impossible. (The key point is the deep inequality {4 > 3}.) \Box

Next up: prime number theorem. References: Kedlaya’s 18.785 notes and Hildebrand’s ANT notes.