In this post I will sketch a proof Dirichlet Theorem’s in the following form:
Theorem 1 (Dirichlet’s Theorem on Arithmetic Progression)
Let

Let
be a positive constant. Then for some constant
depending on
, we have for any
such that
we have

uniformly in
.
Prerequisites: complex analysis, previous two posts, possibly also Dirichlet characters. It is probably also advisable to read the last chapter of Hildebrand first, since this contains a much more thorough version of an easier version in which the zeros of
-functions are less involved.
Warning: I really don’t understand what I am saying. It is at least 50% likely that this post contains a major error, and 90% likely that there will be multiple minor errors. Please kindly point out any screw-ups of mine; thanks!
Throughout this post:
and
, as always. All
-estimates have absolute constants unless noted otherwise, and
means
,
means
. By abuse of notation,
will be short for either
or
, depending on context.
1. Outline
Here are the main steps:
- We introduce Dirichlet character
which will serves as a roots of unity filter, extracting terms
. We will see that this reduces the problem to estimating the function
.
Possibly helpful diagram:

The pink dots denote zeros; we think the nontrivial ones all lie on the half-line by the Generalized Riemann Hypothesis but they could actually be anywhere in the green strip.
2. Dirichlet Characters
2.1. Definitions
Recall that a Dirichlet character
modulo
is a completely multiplicative function
which is also periodic modulo
, and vanishes for all
with
. The trivial character (denoted
) is defined by
when
and
otherwise.
In particular,
and thus each nonzero
value is a
-th primitive root of unity; there are also exactly
Dirichlet characters modulo
. Observe that
, so
. We shall call
even if
and odd otherwise.
If
, then a character
modulo
induces a character
modulo
in a natural way: let
except at the points where
but
, letting
be zero at these points instead. (In effect, we are throwing away information about
.) A character
not induced by any smaller character is called primitive.
2.2. Orthogonality
The key fact about Dirichlet characters which will enable us to prove the theorem is the following trick:
Theorem 2 (Orthogonality of Dirichlet Characters)
We have

(Here
is the conjugate of
, which is essentially a multiplicative inverse.)
This is in some senses a slightly fancier form of the old roots of unity filter. Specifically, it is not too hard to show that
vanishes for
while it is equal to
for
.
2.3. Dirichlet
-Functions
Now we can define the associated
-function by

The properties of these
-functions are that
The proof is pretty much the same as for zeta.
Observe that if
, then
.
2.4. The Functional Equation for Dirichlet
-Functions
While I won’t prove it here, one can show the following analog of the functional equation for Dirichlet
-functions.
Unlike the
case, the
is nastier to describe; computing it involves some Gauss sums that would be too involved for this post. However, I should point out that it is the Gauss sum here that requires
to be primitive. As before,
gives us an meromorphic continuation of
in the entire complex plane. We obtain trivial zeros of
as follows:
- For
even, we get zeros at
,
,
and so on.
- For
even, we get zeros at
,
,
,
and so on (since the pole of
at
is no longer canceled).
- For
odd, we get zeros at
,
,
and so on.
3. Obtaining the Contour Integral
3.1. Orthogonality
Using the trick of orthogonality, we may write

To do this we have to estimate the sum
.
3.2. Introducing the Logarithmic Derivative of the
-Function
First, we realize
as the coefficients of a Dirichlet series. Recall last time we saw that
gave
as coefficients. We can do the same thing with
-functions: put

Taking the derivative, we obtain
Theorem 5
For any
(possibly trivial or imprimitive) we have

Proof:

as desired. 
3.3. The Truncation Trick
Now, we unveil the trick at the heart of the proof of Perron’s Formula in the last post. I will give a more precise statement this time, by stating where this integral comes from:
Lemma 6 (Truncated Version of Perron Lemma)
For any
define

Then
where
is the indicator function defined by

and the error term
is given by

In particular,
.
In effect, the integral from
to
is intended to mimic an indicator function. We can use it to extract the terms of the Dirichlet series of
which happen to have
, by simply appealing to
. Unfortunately, we cannot take
because later on this would introduce a sum which is not absolutely convergent, meaning we will have to live with the error term introduced by picking a particular finite value of
.
3.4. Applying the Truncation
Let’s do so: define

which is almost the same as
, except that if
is actually an integer then
should be halved (since
). Now, we can substitute in our integral representation, and obtain

where

Estimating this is quite ugly, so we defer it to later.
4. Applying the Residue Theorem
4.1. Primitive Characters
Exactly like before, we are going to use a contour to estimate the value of

Let
be a large half-integer (so no zeros of
with
). We then re-route the integration path along the contour integral

During this process we pick up residues, which are the interesting terms.
First, assume that
is primitive, so the functional equation applies and we get the information we want about zeros.
- If
, then so we pick up a residue of
corresponding to

This is the “main term”. Per laziness,
it is.
- Depending on whether
is odd or even, we detect the trivial zeros, which we can express succinctly by

Actually, I really ought to truncate this at
, but since I’m going to let
in a moment I really don’t want to take the time to do so; the difference is negligible.
- We obtain a residue of
at
, which we denote
, for
. Observe that if
is even, this is the constant term of
near
(but there is a pole of the whole function at
); otherwise it equals the value of
straight-out.
- If
is even then
itself has a zero, so we are in worse shape. We recall that

and notice that

so we pick up an extra residue of
. So, call this a bonus of 
- Finally, the hard-to-understand zeros in the strip
. If
is a zero, then it contributes a residue of
. We only pick up the zeros with
in our rectangle, so we get a term

Letting
we derive that

at least for primitive characters. Note that the sum over the zeros is not absolutely convergent without the restriction to
(with it, the sum becomes a finite one).
4.2. Transition to nonprimitive characters
The next step is to notice that if
modulo
happens to be not primitive, and is induced by
with modulus
, then actually
and
are not so different. Specifically, they differ by at most

and so our above formula in fact holds for any character
, if we are willing to add an error term of
. This works even if
is trivial, and also
, so we will just simplify notation by omitting the tilde’s.
Anyways
is piddling compared to all the other error terms in the problem, and we can swallow a lot of the boring residues into a new term, say

Thus we have

Unfortunately, the constant
depends on
and cannot be absorbed. We will also estimate
in the error term party.
5. Distribution of Zeros
In order to estimate

we will need information on both the vertical and horizontal distribution of the zeros. Also, it turns out this will help us compute
.
5.1. Applying Hadamard’s Theorem
Let
be primitive modulo
. As we saw,

is entire. It also is easily seen to have order
, since no term grows much more than exponentially in
(using Stirling to handle the
factor). Thus by Hadamard, we may put

Taking a logarithmic derivative and cleaning up, we derive the following lemma.
Lemma 7 (Hadamard Expansion of Logarithmic Derivative)
For any primitive character
(possibly trivial) we have

Proof: One one hand, we have

On the other hand

Taking the derivative of both sides and setting them equal: we have on the left side

and on the right-hand side

Equating these gives the desired result. 
This will be useful in controlling things later. The
is a constant that turns out to be surprisingly annoying; it is tied to
from the contour, so we will need to deal with it.
5.2. A Bound on the Logarithmic Derivative
Frequently we will take the real part of this. Using Stirling, the short version of this is:
Lemma 8 (Logarithmic Derivative Bound)
Let
and
be primitive (possibly trivial). Then
![\displaystyle \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] = \begin{cases} O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} + \text{Re } \frac{\delta(\chi)}{s-1} & 1 \le \sigma \le 2 \\ O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho} & 1 \le \sigma \le 2, \left\lvert t \right\rvert \ge 2 \\ O(1) & \sigma \ge 2. \end{cases}](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2Bit%2C+%5Cchi%29%7D%7BL%28%5Csigma%2Bit%2C+%5Cchi%29%7D+%5Cright%5D+%3D+%5Cbegin%7Bcases%7D+O%28%5Cmathcal+L%29+-+%5Ctext%7BRe+%7D+%5Csum_%5Crho+%5Cfrac%7B1%7D%7Bs-%5Crho%7D+%2B+%5Ctext%7BRe+%7D+%5Cfrac%7B%5Cdelta%28%5Cchi%29%7D%7Bs-1%7D+%26+1+%5Cle+%5Csigma+%5Cle+2+%5C%5C+O%28%5Cmathcal+L%29+-+%5Ctext%7BRe+%7D+%5Csum_%5Crho+%5Cfrac%7B1%7D%7Bs-%5Crho%7D+%26+1+%5Cle+%5Csigma+%5Cle+2%2C+%5Cleft%5Clvert+t+%5Cright%5Crvert+%5Cge+2+%5C%5C+O%281%29+%26+%5Csigma+%5Cge+2.+%5Cend%7Bcases%7D+&bg=ffffff&fg=000000&s=0&c=20201002)
Proof: The claim is obvious for
, since we can then bound the quantity by
due to the fact that the series representation is valid in that range. The second part with
follows from the first line, by noting that
. So it just suffices to show that

where
and
is primitive.
First, we claim that
. We use the following trick:

where the ends come from taking the logarithmic derivative directly. By switching
with
, the claim follows.
Then, the lemma follows rather directly; the
has miraculously canceled with
. To be explicit, we now have

and the first two terms contribute
and
, respectively; meanwhile the term
is at most
, so it is absorbed. 
Short version: our functional equation lets us relate
to
for
(in fact it’s all we have!) so this gives the following corresponding estimate:
Lemma 9 (Far-Left Estimate of Log Derivative)
If
and
we have
![\displaystyle \frac{L'(s, \chi)}{L(s, \chi)} = O\left[ \log q\left\lvert s \right\rvert \right].](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cfrac%7BL%27%28s%2C+%5Cchi%29%7D%7BL%28s%2C+%5Cchi%29%7D+%3D+O%5Cleft%5B+%5Clog+q%5Cleft%5Clvert+s+%5Cright%5Crvert+%5Cright%5D.+&bg=ffffff&fg=000000&s=0&c=20201002)
Proof:
We have

(the unsymmetric functional equation, which can be obtained from Legendre’s duplication formula). Taking a logarithmic derivative yields

Because we assumed
, the tangent function is bounded as
is sufficiently far from any of its poles along the real axis. Also since
implies the
term is bounded. Finally, the logarithmic derivative of
contributes
according to Stirling. So, total error is
and this gives the conclusion. 
5.3. Horizontal Distribution
I claim that:
Such bad zeros are called Siegel zeros, and I will denote them
. The important part about this estimate is that it does not depend on
but rather on
. We need the relaxation to non-primitive characters, since we will use them in the proof of Landau’s Theorem.
Proof: First, assume
is both primitive and nontrivial.
By the 3-4-1 lemma on
we derive that
![\displaystyle 3 \text{Re } \left[ -\frac{L'(\sigma, \chi_0)}{L(\sigma, \chi_0)} \right] + 4 \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] + \text{Re } \left[ -\frac{L'(\sigma+2it, \chi^2)}{L(\sigma+2it, \chi^2)} \right] \ge 0.](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle+3+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2C+%5Cchi_0%29%7D%7BL%28%5Csigma%2C+%5Cchi_0%29%7D+%5Cright%5D+%2B+4+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2Bit%2C+%5Cchi%29%7D%7BL%28%5Csigma%2Bit%2C+%5Cchi%29%7D+%5Cright%5D+%2B+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2B2it%2C+%5Cchi%5E2%29%7D%7BL%28%5Csigma%2B2it%2C+%5Cchi%5E2%29%7D+%5Cright%5D+%5Cge+0.+&bg=ffffff&fg=000000&s=0&c=20201002)
This is cool because we already know that
![\displaystyle \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] < O(\mathcal L) - \text{Re } \sum_\rho \frac{1}{s-\rho}](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2Bit%2C+%5Cchi%29%7D%7BL%28%5Csigma%2Bit%2C+%5Cchi%29%7D+%5Cright%5D+%3C+O%28%5Cmathcal+L%29+-+%5Ctext%7BRe+%7D+%5Csum_%5Crho+%5Cfrac%7B1%7D%7Bs-%5Crho%7D+&bg=ffffff&fg=000000&s=0&c=20201002)
We now assume
.
In particular, we now have (since
for any zero
)

So we are free to throw out as many terms as we want.
If
is primitive, then everything is clear. Let
be a zero. Then
![\displaystyle \begin{aligned} \text{Re } \left[ -\frac{L'(\sigma, \chi_0)}{L(\sigma, \chi_0)} \right] &\le \frac{1}{\sigma-1} + O(1) \\ \text{Re } \left[ -\frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} \right] &\le O(\mathcal L) - \frac{1}{s-\rho} \\ \text{Re } \left[ -\frac{L'(\sigma+2it, \chi^2)}{L(\sigma+2it, \chi^2)} \right] &\le O(\mathcal L) \end{aligned}](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Baligned%7D+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2C+%5Cchi_0%29%7D%7BL%28%5Csigma%2C+%5Cchi_0%29%7D+%5Cright%5D+%26%5Cle+%5Cfrac%7B1%7D%7B%5Csigma-1%7D+%2B+O%281%29+%5C%5C+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2Bit%2C+%5Cchi%29%7D%7BL%28%5Csigma%2Bit%2C+%5Cchi%29%7D+%5Cright%5D+%26%5Cle+O%28%5Cmathcal+L%29+-+%5Cfrac%7B1%7D%7Bs-%5Crho%7D+%5C%5C+%5Ctext%7BRe+%7D+%5Cleft%5B+-%5Cfrac%7BL%27%28%5Csigma%2B2it%2C+%5Cchi%5E2%29%7D%7BL%28%5Csigma%2B2it%2C+%5Cchi%5E2%29%7D+%5Cright%5D+%26%5Cle+O%28%5Cmathcal+L%29+%5Cend%7Baligned%7D+&bg=ffffff&fg=000000&s=0&c=20201002)
where we have dropped all but one term for the second line, and all terms for the third line. If
is not primitive but at least is not
, then we can replace
with the inducing
for a penalty of at most

just like earlier:
is usually zero, so we just look at the differing terms! The Dirichlet series really are practically the same. (Here we have also used the fact that
, and
.)
Consequently, we derive using
that

Selecting
so that
, we thus obtain

If we select
, we get

so

for some constant
, initially only for primitive
.
But because the Euler product of the
-function of an imprimitive character versus its primitive inducing character differ by a finite number of zeros on the line
it follows that this holds for all nontrivial complex characters.
Unfortunately, if we are unlucky enough that
is trivial, then replacing
causes all hell to break loose. (In particular,
is real in this case!) The problem comes in that our new penalty has an extra
, so

Applied with
, we get the weaker

If
for some
then the
term will be at most
and we live to see another day. In other words, we have unconditionally established a zero-free region of the form

for any
.
Now let’s examine
. We don’t have the facilities to prove that there are no bad zeros, but let’s at least prove that the zero must be simple and real. By Hadamard at
, we have

where we no longer need the real parts since
is real, and in particular the roots of
come in conjugate pairs. The left-hand side can be stupidly bounded below by

So

In other words,

Then, let
, so

The rest is arithmetic; basically one finds that there can be at most one Siegel zero. In particular, since complex zeros come in conjugate pairs, that zero must be real.
It remains to handle the case that
is the constant function giving
. For this, we observe that the
-function in question is just
. Thus, we can decrease the constant
to some
in such a way that the result holds true for
, which completes the proof. 
5.4. Vertical Distribution
We have the following lemma:
Lemma 11 (Sum of Zeros Lemma)
For all real
and primitive characters
(possibly trivial), we have

Proof: We already have that

and we take
, noting that the left-hand side is bounded by a constant
. On the other hand,
and

as needed. 
From this we may deduce that
In particular, we may perturb any given
by
so that the distance between it and the nearest zero is at least
, for some absolute constant
.
From this, using an argument principle we can actually also obtain the following: For a real number
, we have
is the number of zeros of
with imaginary part
. However, we will not need this fact.
6. Error Term Party
Up to now,
has been arbitrary. Assume now
; thus we can now follow the tradition

so
is just to the right of the critical line. This causes
. We assume also for convenience that
.
6.1. Estimating the Truncation Error
Recall that

We need to bound the right-hand side of

If
, the log part is small, and this is bad. We have to split into three cases:
,
, and
. This is necessary because in the event that
(
is a prime power), then
needs to be handled differently.
We let
and
be the nearest prime powers to
other than
itself. Thus this breaks our region to conquer into

So we have possibly a center term (if
is a prime power, we have a term
), plus the far left interval and the far right interval. Let
for convenience.
Finally, if for
outside the interval mentioned above, we in fact have
, say, and so all terms contribute at most

(Recall
had a simple pole at
, so near
it behaves like
.)
The sum of everything is
. Hence, the grand total across all these terms is the horrible

provided
.
6.2. Estimating the Contour Error
We now need to measure the error along the contour, taken from
. Throughout assume
. Naturally, to estimate the integral, we seek good estimates on

For this we appeal to the Hadamard expansion. We break into a couple cases.
- First, let’s look at the integral when
, so
with
large. We bound the horizontal integral along these regions; by symmetry let’s consider just the top

Thus we want an estimate of
.
Proof: Since we assumed that
we need not worry about
and so we obtain

and we eliminate
by computing

where

by Stirling (here we use the fact that
). For the terms where
we see that

So the contribution of the sum for
can be bounded by
, via the vertical sum lemma.
As for the zeros with smaller imaginary part, we at least have
and thus we can reduce the sum to just
![\displaystyle \frac{L'(\sigma+it, \chi)}{L(\sigma+it, \chi)} - \frac{L'(2+it, \chi)}{L(2+it, \chi)} = \sum_{\gamma\in[t-1,t+1]} \frac{1}{\sigma+it-\rho} + O(\mathcal L).](https://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cfrac%7BL%27%28%5Csigma%2Bit%2C+%5Cchi%29%7D%7BL%28%5Csigma%2Bit%2C+%5Cchi%29%7D+-+%5Cfrac%7BL%27%282%2Bit%2C+%5Cchi%29%7D%7BL%282%2Bit%2C+%5Cchi%29%7D+%3D+%5Csum_%7B%5Cgamma%5Cin%5Bt-1%2Ct%2B1%5D%7D+%5Cfrac%7B1%7D%7B%5Csigma%2Bit-%5Crho%7D+%2B+O%28%5Cmathcal+L%29.+&bg=ffffff&fg=000000&s=0&c=20201002)
Now by the assumption that
; so the terms of the sum are all at most
. Also, there are
zeros with imaginary part in that range. Finally, we recall that
is bounded; we can write it using its (convergent) Dirichlet series and then note it is at most
. 
At this point, we perturb
as described in vertical distribution so that the lemma applies, and use can then compute

- Next, for the integral
, we use the “far-left” estimate to obtain

So the contribution in this case is
.
- Along the horizontal integral, we can use the same bound

which vanishes as
.
So we only have two error terms,
and
. The first is clearly larger, so we end with

6.3. The term 
We can estimate
as follows:
Lemma 14
For primitive
. we have

Proof: The idea is to look at
. By subtraction, we obtain

Then at
(eliminating the poles), we have

where the
is
if
and
for
. Furthermore,

which is
by our vertical distribution results, and similarly

This completes the proof. 
Let
be a Siegel zero, if any; for all the other zeros, we have that
. We now have two cases.
. Then
is complex and thus has no exceptional zeros; hence each of its zeros has
; since
is a zero of
if and only if
is a zero of
, it follows that all zeros of
are have
. Moreover, in the range
there are
zeros (putting
in our earlier lemma on vertical distribution).
Thus, total contribution of the sum is
.
- If
, then
is real. The above argument goes through, except that we may have an extra Siegel zero at
; hence there will also be a special zero at
. We pull these terms out separately.Consequently,

By adjusting the constant, we may assume
if it exists.
7. Computing
and 
7.1. Summing the Error Terms
We now have, for any
,
, and
modulo
possibly primitive or trivial, the equality

where

Assume now that
, and
is an integer (hence
). Then aggregating all the errors gives

where the sum over
now excludes the Siegel zero. We can omit the terms
, and also

Absorbing things into the error term,

7.2. Estimating the Sum Over Zeros
Now we want to estimate

We do this is the dumbest way possible: putting a bound on
and pulling it out.
For any non-Siegel zero, we have a zero-free region
, whence

Pulling this out, we can then estimate the reciprocals by using our differential:

Hence,

We select

for some constant
, and moreover assume
, then we obtain

7.3. Summing Up
We would like to sum over all characters
. However, we’re worried that there might be lots of Siegel zeros across characters. A result of Landau tells us this is not the case:
Proof: The character
is not trivial, so we can put

Now we use a silly trick:

by “Simon’s Favorite Factoring Trick” (we use the deep fact that
, the analog of
). The upper bounds give now

and one may deduce the conclusion from here. 
We now sum over all characters
as before to obtain

where
is the character with a Siegel zero, if it exists.
8. Siegel’s Theorem, and Finishing Off
The term with
is bad, and we need some way to get rid of it. We now appeal to Siegel’s Theorem:
Theorem 16 (Siegel’s Theorem)
For any
there is a
such that any Siegel zero
satisfies

Thus for a positive constant
, assuming
, letting
means
, so we obtain

Then

where
. This completes the proof of Dirichlet’s Theorem.