In this post I will sketch a proof Dirichlet Theorem’s in the following form:

**Theorem 1** **(Dirichlet’s Theorem on Arithmetic Progression)**

Let

Let be a positive constant. Then for some constant depending on , we have for any such that we have

uniformly in .

Prerequisites: complex analysis, previous two posts, possibly also Dirichlet characters. It is probably also advisable to read the last chapter of Hildebrand first, since this contains a much more thorough version of an easier version in which the zeros of -functions are less involved.

Warning: I *really* don’t understand what I am saying. It is at least 50% likely that this post contains a major error, and 90% likely that there will be multiple minor errors. Please kindly point out any screw-ups of mine; thanks!

Throughout this post: and , as always. All -estimates have absolute constants unless noted otherwise, and means , means . By abuse of notation, will be short for either or , depending on context.

## 1. Outline

Here are the main steps:

- We introduce
**Dirichlet character**which will serves as a roots of unity filter, extracting terms . We will see that this reduces the problem to estimating the function .

- Introduce the -function , the generalization of for arithmetic progressions. Establish a functional equation in terms of , much like with , and use it to extend to a meromorphic function in the entire complex plane.
- We will use a variation on the Perron transformation in order to transform this sum into an integral involving an -function . We truncate this integral to ; this introduces an error that can be computed immediately, though in this presentation we delay its computation until later.
- We do a contour as in the proof of the Prime Number Theorem in order to estimate the above integral in terms of the zeros of . The main term emerges as a residue, so we want to show that the integral along this integral goes to zero. Moreover, we get some residues related to the zeros of the -function.
- By using Hadamard’s Theorem on which is entire, we can write in terms of its zeros. This has three consequences:
- We can use the previous to get bounds on .
- Using a 3-4-1 trick, this gives us information on the horizontal distribution of ; the dreaded Siegel zeros appear here.
- We can get an expression which lets us estimate the vertical distribution of the zeros in the critical strip (specifically the number of zeros with ).

The first and third points let us compute .

- The horizontal zero-free region gives us an estimate of , which along with and gives us the value of .
- We use Siegel’s Theorem to handle the potential Siegel zero that might arise.

Possibly helpful diagram:

The pink dots denote zeros; we think the nontrivial ones all lie on the half-line by the Generalized Riemann Hypothesis but they could actually be anywhere in the green strip.

## 2. Dirichlet Characters

### 2.1. Definitions

Recall that a **Dirichlet character modulo ** is a completely multiplicative function which is also periodic modulo , and vanishes for all with . The **trivial character** (denoted ) is defined by when and otherwise.

In particular, and thus each nonzero value is a -th primitive root of unity; there are also exactly Dirichlet characters modulo . Observe that , so . We shall call **even** if and **odd** otherwise.

If , then a character modulo **induces** a character modulo in a natural way: let except at the points where but , letting be zero at these points instead. (In effect, we are throwing away information about .) A character not induced by any smaller character is called **primitive**.

### 2.2. Orthogonality

The key fact about Dirichlet characters which will enable us to prove the theorem is the following trick:

**Theorem 2** **(Orthogonality of Dirichlet Characters)**

We have

(Here is the conjugate of , which is essentially a multiplicative inverse.)

This is in some senses a slightly fancier form of the old roots of unity filter. Specifically, it is not too hard to show that vanishes for while it is equal to for .

### 2.3. Dirichlet -Functions

Now we can define the associated -function by

The properties of these -functions are that

**Theorem 3**

Let be a Dirichlet character modulo . Then

- If , can be extended to a holomorphic function on .
- If , can be extended to a meromorphic function on , with a single simple pole at of residue .

The proof is pretty much the same as for zeta.

Observe that if , then .

### 2.4. The Functional Equation for Dirichlet -Functions

While I won’t prove it here, one can show the following analog of the functional equation for Dirichlet -functions.

**Theorem 4** **(The Functional Equation of Dirichlet -Functions)**

Assume that is a character modulo , possibly trivial or imprimitive. Let if is even and if is odd. Let

where

and if and zero otherwise. Then

- is entire.
- If is primitive, then for some complex number .

Unlike the case, the is nastier to describe; computing it involves some Gauss sums that would be too involved for this post. However, I should point out that it is the Gauss sum here that requires to be primitive. As before, gives us an meromorphic continuation of in the entire complex plane. We obtain **trivial zeros** of as follows:

- For even, we get zeros at , , and so on.
- For even, we get zeros at , , , and so on (since the pole of at is no longer canceled).
- For odd, we get zeros at , , and so on.

## 3. Obtaining the Contour Integral

### 3.1. Orthogonality

Using the trick of orthogonality, we may write

To do this we have to estimate the sum .

### 3.2. Introducing the Logarithmic Derivative of the -Function

First, we realize as the coefficients of a Dirichlet series. Recall last time we saw that gave as coefficients. We can do the same thing with -functions: put

Taking the derivative, we obtain

**Theorem 5**

For any (possibly trivial or imprimitive) we have

*Proof:*

as desired.

### 3.3. The Truncation Trick

Now, we unveil the trick at the heart of the proof of Perron’s Formula in the last post. I will give a more precise statement this time, by stating where this integral comes from:

**Lemma 6** **(Truncated Version of Perron Lemma)**

For any define

Then where is the indicator function defined by

and the error term is given by

In particular, .

In effect, the integral from to is intended to mimic an indicator function. We can use it to extract the terms of the Dirichlet series of which happen to have , by simply appealing to . Unfortunately, we cannot take because later on this would introduce a sum which is not absolutely convergent, meaning we will have to live with the error term introduced by picking a particular finite value of .

### 3.4. Applying the Truncation

Let’s do so: define

which is almost the same as , except that if is actually an integer then should be halved (since ). Now, we can substitute in our integral representation, and obtain

where

Estimating this is quite ugly, so we defer it to later.

## 4. Applying the Residue Theorem

### 4.1. Primitive Characters

Exactly like before, we are going to use a contour to estimate the value of

Let be a large half-integer (so no zeros of with ). We then re-route the integration path along the contour integral

During this process we pick up residues, which are the interesting terms.

First, assume that is primitive, so the functional equation applies and we get the information we want about zeros.

- If , then so we pick up a residue of corresponding to
This is the “main term”. Per laziness, it is.

- Depending on whether is odd or even, we detect the trivial zeros, which we can express succinctly by
Actually, I really ought to truncate this at , but since I’m going to let in a moment I really don’t want to take the time to do so; the difference is negligible.

- We obtain a residue of at , which we denote , for . Observe that if is even, this is the constant term of near (but there is a pole of the whole function at ); otherwise it equals the value of straight-out.
- If is even then itself has a zero, so we are in worse shape. We recall that
and notice that

so we pick up an extra residue of . So, call this a bonus of

- Finally, the hard-to-understand zeros in the strip . If is a zero, then it contributes a residue of . We only pick up the zeros with in our rectangle, so we get a term

Letting we derive that

at least for primitive characters. Note that the sum over the zeros is not absolutely convergent without the restriction to (with it, the sum becomes a finite one).

### 4.2. Transition to nonprimitive characters

The next step is to notice that if modulo happens to be not primitive, and is induced by with modulus , then actually and are not so different. Specifically, they differ by at most

and so our above formula in fact holds for any character , if we are willing to add an error term of . This works even if is trivial, and also , so we will just simplify notation by omitting the tilde’s.

Anyways is piddling compared to all the other error terms in the problem, and we can swallow a lot of the boring residues into a new term, say

Thus we have

Unfortunately, the constant depends on and cannot be absorbed. We will also estimate in the error term party.

## 5. Distribution of Zeros

In order to estimate

we will need information on both the vertical and horizontal distribution of the zeros. Also, it turns out this will help us compute .

### 5.1. Applying Hadamard’s Theorem

Let be primitive modulo . As we saw,

is entire. It also is easily seen to have order , since no term grows much more than exponentially in (using Stirling to handle the factor). Thus by Hadamard, we may put

Taking a logarithmic derivative and cleaning up, we derive the following lemma.

**Lemma 7** **(Hadamard Expansion of Logarithmic Derivative)**

For any primitive character (possibly trivial) we have

*Proof:* One one hand, we have

On the other hand

Taking the derivative of both sides and setting them equal: we have on the left side

and on the right-hand side

Equating these gives the desired result.

This will be useful in controlling things later. The is a constant that turns out to be surprisingly annoying; it is tied to from the contour, so we will need to deal with it.

### 5.2. A Bound on the Logarithmic Derivative

Frequently we will take the real part of this. Using Stirling, the short version of this is:

**Lemma 8** **(Logarithmic Derivative Bound)**

Let and be primitive (possibly trivial). Then

*Proof:* The claim is obvious for , since we can then bound the quantity by due to the fact that the series representation is valid in that range. The second part with follows from the first line, by noting that . So it just suffices to show that

where and is primitive.

First, we claim that . We use the following trick:

where the ends come from taking the logarithmic derivative directly. By switching with , the claim follows.

Then, the lemma follows rather directly; the has miraculously canceled with . To be explicit, we now have

and the first two terms contribute and , respectively; meanwhile the term is at most , so it is absorbed.

Short version: our functional equation lets us relate to for (in fact it’s all we have!) so this gives the following corresponding estimate:

**Lemma 9** **(Far-Left Estimate of Log Derivative)**

If and we have

*Proof:*

We have

(the unsymmetric functional equation, which can be obtained from Legendre’s duplication formula). Taking a logarithmic derivative yields

Because we assumed , the tangent function is bounded as is sufficiently far from any of its poles along the real axis. Also since implies the term is bounded. Finally, the logarithmic derivative of contributes according to Stirling. So, total error is and this gives the conclusion.

### 5.3. Horizontal Distribution

I claim that:

**Theorem 10** **(Horizontal Distribution Bound)**

Let be a character, possibly trivial or imprimitive. There exists an absolute constant with the following properties:

- If is complex, then no zeros are in the region .
- If is real, there are no zeros in the region , with at most one exception; this zero must be real and simple.

Such bad zeros are called **Siegel zeros**, and I will denote them . The important part about this estimate is that it does not depend on but rather on . We need the relaxation to non-primitive characters, since we will use them in the proof of Landau’s Theorem.

*Proof:* First, assume is both primitive and nontrivial.

By the 3-4-1 lemma on we derive that

This is cool because we already know that

We now assume .

In particular, we now have (since for any zero )

So we are free to throw out as many terms as we want.

If is primitive, then everything is clear. Let be a zero. Then

where we have dropped all but one term for the second line, and all terms for the third line. If is not primitive but at least is not , then we can replace with the inducing for a penalty of at most

just like earlier: is usually zero, so we just look at the differing terms! The Dirichlet series really are practically the same. (Here we have also used the fact that , and .)

Consequently, we derive using that

Selecting so that , we thus obtain

If we select , we get

so

for some constant , initially only for primitive .

But because the Euler product of the -function of an imprimitive character versus its primitive inducing character differ by a finite number of zeros on the line it follows that this holds for all nontrivial complex characters.

Unfortunately, if we are unlucky enough that is trivial, then replacing causes all hell to break loose. (In particular, is real in this case!) The problem comes in that our new penalty has an extra , so

Applied with , we get the weaker

If for some then the term will be at most and we live to see another day. In other words, we have unconditionally established a zero-free region of the form

for any .

Now let’s examine . We don’t have the facilities to prove that there are no bad zeros, but let’s at least prove that the zero must be simple and real. By Hadamard at , we have

where we no longer need the real parts since is real, and in particular the roots of come in conjugate pairs. The left-hand side can be stupidly bounded below by

So

In other words,

Then, let , so

The rest is arithmetic; basically one finds that there can be at most one Siegel zero. In particular, since complex zeros come in conjugate pairs, that zero must be real.

It remains to handle the case that is the constant function giving . For this, we observe that the -function in question is just . Thus, we can decrease the constant to some in such a way that the result holds true for , which completes the proof.

### 5.4. Vertical Distribution

We have the following lemma:

**Lemma 11** **(Sum of Zeros Lemma)**

For all real and primitive characters (possibly trivial), we have

*Proof:* We already have that

and we take , noting that the left-hand side is bounded by a constant . On the other hand, and

as needed.

From this we may deduce that

**Lemma 12** **(Number of Zeros Nearby )**

For all real and primitive characters (possibly trivial), the number of zeros with is .

In particular, we may perturb any given by so that the distance between it and the nearest zero is at least , for some absolute constant .

From this, using an argument principle we can actually also obtain the following: For a real number , we have is the number of zeros of with imaginary part . However, we will not need this fact.

## 6. Error Term Party

Up to now, has been arbitrary. Assume now ; thus we can now follow the tradition

so is just to the right of the critical line. This causes . We assume also for convenience that .

### 6.1. Estimating the Truncation Error

Recall that

We need to bound the right-hand side of

If , the log part is small, and this is bad. We have to split into three cases: , , and . This is necessary because in the event that ( is a prime power), then needs to be handled differently.

We let and be the nearest prime powers to other than itself. Thus this breaks our region to conquer into

So we have possibly a center term (if is a prime power, we have a term ), plus the far left interval and the far right interval. Let for convenience.

- In the easy case, if we have a contribution of , which is piddling (less than ).
- Suppose . If for some integer , then
by using the silly inequality for . So the contribution in total is at most

provided .

- If , we have
Hence in this case, we get an error at most

- The cases and give the same bounds as above, in the same way.

Finally, if for outside the interval mentioned above, we in fact have , say, and so all terms contribute at most

(Recall had a simple pole at , so near it behaves like .)

The sum of everything is . Hence, the grand total across all these terms is the horrible

provided .

### 6.2. Estimating the Contour Error

We now need to measure the error along the contour, taken from . Throughout assume . Naturally, to estimate the integral, we seek good estimates on

For this we appeal to the Hadamard expansion. We break into a couple cases.

- First, let’s look at the integral when , so with large. We bound the horizontal integral along these regions; by symmetry let’s consider just the top
Thus we want an estimate of .

**Lemma 13**Let be such that , . Assume is primitive (possibly trivial), and that is not within of any zeros of . Then

*Proof:*Since we assumed that we need not worry about and so we obtainand we eliminate by computing

where

by Stirling (here we use the fact that ). For the terms where we see that

So the contribution of the sum for can be bounded by , via the vertical sum lemma.

As for the zeros with smaller imaginary part, we at least have and thus we can reduce the sum to just

Now by the assumption that ; so the terms of the sum are all at most . Also, there are zeros with imaginary part in that range. Finally, we recall that is bounded; we can write it using its (convergent) Dirichlet series and then note it is at most .

At this point, we perturb as described in vertical distribution so that the lemma applies, and use can then compute - Next, for the integral , we use the “far-left” estimate to obtain
So the contribution in this case is .

- Along the horizontal integral, we can use the same bound
which vanishes as .

So we only have two error terms, and . The first is clearly larger, so we end with

### 6.3. The term

We can estimate as follows:

**Lemma 14**

For primitive . we have

*Proof:* The idea is to look at . By subtraction, we obtain

Then at (eliminating the poles), we have

where the is if and for . Furthermore,

which is by our vertical distribution results, and similarly

This completes the proof.

Let be a Siegel zero, if any; for all the other zeros, we have that . We now have two cases.

- . Then is complex and thus has no exceptional zeros; hence each of its zeros has ; since is a zero of if and only if is a zero of , it follows that all zeros of are have . Moreover, in the range there are zeros (putting in our earlier lemma on vertical distribution).

Thus, total contribution of the sum is .

- If , then is real. The above argument goes through, except that we may have an extra Siegel zero at ; hence there will also be a special zero at . We pull these terms out separately.Consequently,
By adjusting the constant, we may assume if it exists.

## 7. Computing and

### 7.1. Summing the Error Terms

We now have, for any , , and modulo possibly primitive or trivial, the equality

where

Assume now that , and is an integer (hence ). Then aggregating all the errors gives

where the sum over now excludes the Siegel zero. We can omit the terms , and also

Absorbing things into the error term,

### 7.2. Estimating the Sum Over Zeros

Now we want to estimate

We do this is the dumbest way possible: putting a bound on and pulling it out.

For any non-Siegel zero, we have a zero-free region , whence

Pulling this out, we can then estimate the reciprocals by using our differential:

Hence,

We select

for some constant , and moreover assume , then we obtain

### 7.3. Summing Up

We would like to sum over all characters . However, we’re worried that there might be lots of Siegel zeros across characters. A result of Landau tells us this is not the case:

**Theorem 15****(Landau)**If and are real nontrivial primitive characters modulo and , then for any zeros and we have

for some fixed absolute . In particular, for any fixed , there is at most one with a Siegel zero.

*Proof:*The character is not trivial, so we can putNow we use a silly trick:

by “Simon’s Favorite Factoring Trick” (we use the deep fact that , the analog of ). The upper bounds give now

and one may deduce the conclusion from here.

We now sum over all characters as before to obtain

where is the character with a Siegel zero, if it exists.

## 8. Siegel’s Theorem, and Finishing Off

The term with is bad, and we need some way to get rid of it. We now appeal to Siegel’s Theorem:

**Theorem 16****(Siegel’s Theorem)**For any there is a such that any Siegel zero satisfies

Thus for a positive constant , assuming , letting means , so we obtain

Then

where . This completes the proof of Dirichlet’s Theorem.