In this post I will sketch a proof Dirichlet Theorem’s in the following form:
Theorem 1 (Dirichlet’s Theorem on Arithmetic Progression)
Let
Let be a positive constant. Then for some constant
depending on
, we have for any
such that
we have
uniformly in .
Prerequisites: complex analysis, previous two posts, possibly also Dirichlet characters. It is probably also advisable to read the last chapter of Hildebrand first, since this contains a much more thorough version of an easier version in which the zeros of -functions are less involved.
Warning: I really don’t understand what I am saying. It is at least 50% likely that this post contains a major error, and 90% likely that there will be multiple minor errors. Please kindly point out any screw-ups of mine; thanks!
Throughout this post: and
, as always. All
-estimates have absolute constants unless noted otherwise, and
means
,
means
. By abuse of notation,
will be short for either
or
, depending on context.
1. Outline
Here are the main steps:
- We introduce Dirichlet character
which will serves as a roots of unity filter, extracting terms
. We will see that this reduces the problem to estimating the function
.
- Introduce the
-function
, the generalization of
for arithmetic progressions. Establish a functional equation in terms of
, much like with
, and use it to extend
to a meromorphic function in the entire complex plane.
- We will use a variation on the Perron transformation in order to transform this sum into an integral involving an
-function
. We truncate this integral to
; this introduces an error
that can be computed immediately, though in this presentation we delay its computation until later.
- We do a contour as in the proof of the Prime Number Theorem in order to estimate the above integral in terms of the zeros of
. The main term emerges as a residue, so we want to show that the integral
along this integral goes to zero. Moreover, we get some residues
related to the zeros of the
-function.
- By using Hadamard’s Theorem on
which is entire, we can write
in terms of its zeros. This has three consequences:
- We can use the previous to get bounds on
.
- Using a 3-4-1 trick, this gives us information on the horizontal distribution of
; the dreaded Siegel zeros appear here.
- We can get an expression which lets us estimate the vertical distribution of the zeros in the critical strip (specifically the number of zeros with
).
The first and third points let us compute
.
- We can use the previous to get bounds on
- The horizontal zero-free region gives us an estimate of
, which along with
and
gives us the value of
.
- We use Siegel’s Theorem to handle the potential Siegel zero that might arise.
Possibly helpful diagram:
The pink dots denote zeros; we think the nontrivial ones all lie on the half-line by the Generalized Riemann Hypothesis but they could actually be anywhere in the green strip.
2. Dirichlet Characters
2.1. Definitions
Recall that a Dirichlet character modulo
is a completely multiplicative function
which is also periodic modulo
, and vanishes for all
with
. The trivial character (denoted
) is defined by
when
and
otherwise.
In particular, and thus each nonzero
value is a
-th primitive root of unity; there are also exactly
Dirichlet characters modulo
. Observe that
, so
. We shall call
even if
and odd otherwise.
If , then a character
modulo
induces a character
modulo
in a natural way: let
except at the points where
but
, letting
be zero at these points instead. (In effect, we are throwing away information about
.) A character
not induced by any smaller character is called primitive.
2.2. Orthogonality
The key fact about Dirichlet characters which will enable us to prove the theorem is the following trick:
Theorem 2 (Orthogonality of Dirichlet Characters)
We have
(Here is the conjugate of
, which is essentially a multiplicative inverse.)
This is in some senses a slightly fancier form of the old roots of unity filter. Specifically, it is not too hard to show that vanishes for
while it is equal to
for
.
2.3. Dirichlet
-Functions
Now we can define the associated -function by
The properties of these -functions are that
Theorem 3
Let be a Dirichlet character modulo
. Then
- If
,
can be extended to a holomorphic function on
.
- If
,
can be extended to a meromorphic function on
, with a single simple pole at
of residue
.
The proof is pretty much the same as for zeta.
Observe that if , then
.
2.4. The Functional Equation for Dirichlet
-Functions
While I won’t prove it here, one can show the following analog of the functional equation for Dirichlet -functions.
Theorem 4 (The Functional Equation of Dirichlet -Functions)
Assume that is a character modulo
, possibly trivial or imprimitive. Let
if
is even and
if
is odd. Let
where
and if
and zero otherwise. Then
is entire.
- If
is primitive, then
for some complex number
.
Unlike the case, the
is nastier to describe; computing it involves some Gauss sums that would be too involved for this post. However, I should point out that it is the Gauss sum here that requires
to be primitive. As before,
gives us an meromorphic continuation of
in the entire complex plane. We obtain trivial zeros of
as follows:
- For
even, we get zeros at
,
,
and so on.
- For
even, we get zeros at
,
,
,
and so on (since the pole of
at
is no longer canceled).
- For
odd, we get zeros at
,
,
and so on.
3. Obtaining the Contour Integral
3.1. Orthogonality
Using the trick of orthogonality, we may write
To do this we have to estimate the sum .
3.2. Introducing the Logarithmic Derivative of the
-Function
First, we realize as the coefficients of a Dirichlet series. Recall last time we saw that
gave
as coefficients. We can do the same thing with
-functions: put
Taking the derivative, we obtain
Theorem 5
For any (possibly trivial or imprimitive) we have
Proof:
as desired.
3.3. The Truncation Trick
Now, we unveil the trick at the heart of the proof of Perron’s Formula in the last post. I will give a more precise statement this time, by stating where this integral comes from:
Lemma 6 (Truncated Version of Perron Lemma)
For any define
Then where
is the indicator function defined by
and the error term is given by
In particular, .
In effect, the integral from to
is intended to mimic an indicator function. We can use it to extract the terms of the Dirichlet series of
which happen to have
, by simply appealing to
. Unfortunately, we cannot take
because later on this would introduce a sum which is not absolutely convergent, meaning we will have to live with the error term introduced by picking a particular finite value of
.
3.4. Applying the Truncation
Let’s do so: define
which is almost the same as , except that if
is actually an integer then
should be halved (since
). Now, we can substitute in our integral representation, and obtain
where
Estimating this is quite ugly, so we defer it to later.
4. Applying the Residue Theorem
4.1. Primitive Characters
Exactly like before, we are going to use a contour to estimate the value of
Let be a large half-integer (so no zeros of
with
). We then re-route the integration path along the contour integral
During this process we pick up residues, which are the interesting terms.
First, assume that is primitive, so the functional equation applies and we get the information we want about zeros.
- If
, then so we pick up a residue of
corresponding to
This is the “main term”. Per laziness,
it is.
- Depending on whether
is odd or even, we detect the trivial zeros, which we can express succinctly by
Actually, I really ought to truncate this at
, but since I’m going to let
in a moment I really don’t want to take the time to do so; the difference is negligible.
- We obtain a residue of
at
, which we denote
, for
. Observe that if
is even, this is the constant term of
near
(but there is a pole of the whole function at
); otherwise it equals the value of
straight-out.
- If
is even then
itself has a zero, so we are in worse shape. We recall that
and notice that
so we pick up an extra residue of
. So, call this a bonus of
- Finally, the hard-to-understand zeros in the strip
. If
is a zero, then it contributes a residue of
. We only pick up the zeros with
in our rectangle, so we get a term
Letting we derive that
at least for primitive characters. Note that the sum over the zeros is not absolutely convergent without the restriction to (with it, the sum becomes a finite one).
4.2. Transition to nonprimitive characters
The next step is to notice that if modulo
happens to be not primitive, and is induced by
with modulus
, then actually
and
are not so different. Specifically, they differ by at most
and so our above formula in fact holds for any character , if we are willing to add an error term of
. This works even if
is trivial, and also
, so we will just simplify notation by omitting the tilde’s.
Anyways is piddling compared to all the other error terms in the problem, and we can swallow a lot of the boring residues into a new term, say
Thus we have
Unfortunately, the constant depends on
and cannot be absorbed. We will also estimate
in the error term party.
5. Distribution of Zeros
In order to estimate
we will need information on both the vertical and horizontal distribution of the zeros. Also, it turns out this will help us compute .
5.1. Applying Hadamard’s Theorem
Let be primitive modulo
. As we saw,
is entire. It also is easily seen to have order , since no term grows much more than exponentially in
(using Stirling to handle the
factor). Thus by Hadamard, we may put
Taking a logarithmic derivative and cleaning up, we derive the following lemma.
Lemma 7 (Hadamard Expansion of Logarithmic Derivative)
For any primitive character (possibly trivial) we have
Proof: One one hand, we have
On the other hand
Taking the derivative of both sides and setting them equal: we have on the left side
and on the right-hand side
Equating these gives the desired result.
This will be useful in controlling things later. The is a constant that turns out to be surprisingly annoying; it is tied to
from the contour, so we will need to deal with it.
5.2. A Bound on the Logarithmic Derivative
Frequently we will take the real part of this. Using Stirling, the short version of this is:
Lemma 8 (Logarithmic Derivative Bound)
Let and
be primitive (possibly trivial). Then
Proof: The claim is obvious for , since we can then bound the quantity by
due to the fact that the series representation is valid in that range. The second part with
follows from the first line, by noting that
. So it just suffices to show that
where and
is primitive.
First, we claim that . We use the following trick:
where the ends come from taking the logarithmic derivative directly. By switching with
, the claim follows.
Then, the lemma follows rather directly; the has miraculously canceled with
. To be explicit, we now have
and the first two terms contribute and
, respectively; meanwhile the term
is at most
, so it is absorbed.
Short version: our functional equation lets us relate to
for
(in fact it’s all we have!) so this gives the following corresponding estimate:
Lemma 9 (Far-Left Estimate of Log Derivative)
If and
we have
Proof:
We have
(the unsymmetric functional equation, which can be obtained from Legendre’s duplication formula). Taking a logarithmic derivative yields
Because we assumed , the tangent function is bounded as
is sufficiently far from any of its poles along the real axis. Also since
implies the
term is bounded. Finally, the logarithmic derivative of
contributes
according to Stirling. So, total error is
and this gives the conclusion.
5.3. Horizontal Distribution
I claim that:
Theorem 10 (Horizontal Distribution Bound)
Let be a character, possibly trivial or imprimitive. There exists an absolute constant
with the following properties:
- If
is complex, then no zeros are in the region
.
- If
is real, there are no zeros in the region
, with at most one exception; this zero must be real and simple.
Such bad zeros are called Siegel zeros, and I will denote them . The important part about this estimate is that it does not depend on
but rather on
. We need the relaxation to non-primitive characters, since we will use them in the proof of Landau’s Theorem.
Proof: First, assume is both primitive and nontrivial.
By the 3-4-1 lemma on we derive that
This is cool because we already know that
We now assume .
In particular, we now have (since for any zero
)
So we are free to throw out as many terms as we want.
If is primitive, then everything is clear. Let
be a zero. Then
where we have dropped all but one term for the second line, and all terms for the third line. If is not primitive but at least is not
, then we can replace
with the inducing
for a penalty of at most
just like earlier: is usually zero, so we just look at the differing terms! The Dirichlet series really are practically the same. (Here we have also used the fact that
, and
.)
Consequently, we derive using that
Selecting so that
, we thus obtain
If we select , we get
so
for some constant , initially only for primitive
.
But because the Euler product of the -function of an imprimitive character versus its primitive inducing character differ by a finite number of zeros on the line
it follows that this holds for all nontrivial complex characters.
Unfortunately, if we are unlucky enough that is trivial, then replacing
causes all hell to break loose. (In particular,
is real in this case!) The problem comes in that our new penalty has an extra
, so
Applied with , we get the weaker
If for some
then the
term will be at most
and we live to see another day. In other words, we have unconditionally established a zero-free region of the form
for any .
Now let’s examine . We don’t have the facilities to prove that there are no bad zeros, but let’s at least prove that the zero must be simple and real. By Hadamard at
, we have
where we no longer need the real parts since is real, and in particular the roots of
come in conjugate pairs. The left-hand side can be stupidly bounded below by
So
In other words,
Then, let , so
The rest is arithmetic; basically one finds that there can be at most one Siegel zero. In particular, since complex zeros come in conjugate pairs, that zero must be real.
It remains to handle the case that is the constant function giving
. For this, we observe that the
-function in question is just
. Thus, we can decrease the constant
to some
in such a way that the result holds true for
, which completes the proof.
5.4. Vertical Distribution
We have the following lemma:
Lemma 11 (Sum of Zeros Lemma)
For all real and primitive characters
(possibly trivial), we have
Proof: We already have that
and we take , noting that the left-hand side is bounded by a constant
. On the other hand,
and
as needed.
From this we may deduce that
Lemma 12 (Number of Zeros Nearby )
For all real and primitive characters
(possibly trivial), the number of zeros
with
is
.
In particular, we may perturb any given by
so that the distance between it and the nearest zero is at least
, for some absolute constant
.
From this, using an argument principle we can actually also obtain the following: For a real number , we have
is the number of zeros of
with imaginary part
. However, we will not need this fact.
6. Error Term Party
Up to now, has been arbitrary. Assume now
; thus we can now follow the tradition
so is just to the right of the critical line. This causes
. We assume also for convenience that
.
6.1. Estimating the Truncation Error
Recall that
We need to bound the right-hand side of
If , the log part is small, and this is bad. We have to split into three cases:
,
, and
. This is necessary because in the event that
(
is a prime power), then
needs to be handled differently.
We let and
be the nearest prime powers to
other than
itself. Thus this breaks our region to conquer into
So we have possibly a center term (if is a prime power, we have a term
), plus the far left interval and the far right interval. Let
for convenience.
- In the easy case, if
we have a contribution of
, which is piddling (less than
).
- Suppose
. If
for some integer
, then
by using the silly inequality
for
. So the contribution in total is at most
provided
.
- If
, we have
Hence in this case, we get an error at most
- The cases
and
give the same bounds as above, in the same way.
Finally, if for outside the interval mentioned above, we in fact have
, say, and so all terms contribute at most
(Recall had a simple pole at
, so near
it behaves like
.)
The sum of everything is . Hence, the grand total across all these terms is the horrible
provided .
6.2. Estimating the Contour Error
We now need to measure the error along the contour, taken from . Throughout assume
. Naturally, to estimate the integral, we seek good estimates on
For this we appeal to the Hadamard expansion. We break into a couple cases.
- First, let’s look at the integral when
, so
with
large. We bound the horizontal integral along these regions; by symmetry let’s consider just the top
Thus we want an estimate of
.
Lemma 13
Let
be such that
,
. Assume
is primitive (possibly trivial), and that
is not within
of any zeros of
. Then
Proof: Since we assumed that
we need not worry about
and so we obtain
and we eliminate
by computing
where
by Stirling (here we use the fact that
). For the terms where
we see that
So the contribution of the sum for
can be bounded by
, via the vertical sum lemma.
As for the zeros with smaller imaginary part, we at least have
and thus we can reduce the sum to just
Now by the assumption that
; so the terms of the sum are all at most
. Also, there are
zeros with imaginary part in that range. Finally, we recall that
is bounded; we can write it using its (convergent) Dirichlet series and then note it is at most
.
At this point, we perturbas described in vertical distribution so that the lemma applies, and use can then compute
- Next, for the integral
, we use the “far-left” estimate to obtain
So the contribution in this case is
.
- Along the horizontal integral, we can use the same bound
which vanishes as
.
So we only have two error terms, and
. The first is clearly larger, so we end with
6.3. The term 
We can estimate as follows:
Lemma 14
For primitive . we have
Proof: The idea is to look at . By subtraction, we obtain
Then at (eliminating the poles), we have
where the is
if
and
for
. Furthermore,
which is by our vertical distribution results, and similarly
This completes the proof.
Let be a Siegel zero, if any; for all the other zeros, we have that
. We now have two cases.
. Then
is complex and thus has no exceptional zeros; hence each of its zeros has
; since
is a zero of
if and only if
is a zero of
, it follows that all zeros of
are have
. Moreover, in the range
there are
zeros (putting
in our earlier lemma on vertical distribution).
Thus, total contribution of the sum is .
- If
, then
is real. The above argument goes through, except that we may have an extra Siegel zero at
; hence there will also be a special zero at
. We pull these terms out separately.Consequently,
By adjusting the constant, we may assume
if it exists.
7. Computing
and
7.1. Summing the Error Terms
We now have, for any
,
, and
modulo
possibly primitive or trivial, the equality
where
Assume now that
, and
is an integer (hence
). Then aggregating all the errors gives
where the sum over
now excludes the Siegel zero. We can omit the terms
, and also
Absorbing things into the error term,
7.2. Estimating the Sum Over Zeros
Now we want to estimate
We do this is the dumbest way possible: putting a bound on
and pulling it out.
For any non-Siegel zero, we have a zero-free region
, whence
Pulling this out, we can then estimate the reciprocals by using our differential:
Hence,
We select
for some constant
, and moreover assume
, then we obtain
7.3. Summing Up
We would like to sum over all characters
. However, we’re worried that there might be lots of Siegel zeros across characters. A result of Landau tells us this is not the case:
Theorem 15 (Landau)
If
and
are real nontrivial primitive characters modulo
and
, then for any zeros
and
we have
for some fixed absolute
. In particular, for any fixed
, there is at most one
with a Siegel zero.
Proof: The character
is not trivial, so we can put
Now we use a silly trick:
by “Simon’s Favorite Factoring Trick” (we use the deep fact that
, the analog of
). The upper bounds give now
and one may deduce the conclusion from here.
We now sum over all characters
as before to obtain
where
is the character with a Siegel zero, if it exists.
8. Siegel’s Theorem, and Finishing Off
The term with
is bad, and we need some way to get rid of it. We now appeal to Siegel’s Theorem:
Theorem 16 (Siegel’s Theorem)
For any
there is a
such that any Siegel zero
satisfies
Thus for a positive constant
, assuming
, letting
means
, so we obtain
Then
where
. This completes the proof of Dirichlet’s Theorem.
Slight typo: the entry title says “Dircihlet” instead of “Dirichlet”.
Also, the first sentence “In this post I will sketch a proof Dirichlet Theorem’s in the following form:” doesn’t make sense lol.
LikeLike
Thanks, I’ve fixed that. And yes, it was originally supposed to be a sketch but got longer than that…
LikeLike
No, what I meant was there was a missing “of” or something to that effect.
In any case, do you happen to know the “strongest” form of Dirichlet’s theorem that can be proven using elementary methods?
LikeLike
I don’t know of any elementary (non-analytic) methods that get useful forms of Dirichlet’s Theorem. The closest I know of is showing that there are infinitely many primes
, by plugging things in to the
th cyclotomic polynomial.
LikeLike
Isn’t this the Siegel-Walfisz Theorem? I thought Dirichlet’s Theorem was the one that stated that there are infinitely many primes in arithmetic progressions (excluding the trivial cases).
LikeLike
Fair point, you’re right of course. :) I usually just call it Dirichlet just because everyone knows immediately what it means, whereas I’m never heard of the name “Siegel-Walfisz” until now. (I hear it called PNT for Arithmetic Progressions, though.)
LikeLike
[…] I talked about last time on my blog, the key ingredients […]
LikeLike