Things Fourier

For some reason several classes at MIT this year involve Fourier analysis. I was always confused about this as a high schooler, because no one ever gave me the “orthonormal basis” explanation, so here goes. As a bonus, I also prove a form of Arrow’s Impossibility Theorem using binary Fourier analysis, and then talk about the fancier generalizations using Pontryagin duality and the Peter-Weyl theorem.

In what follows, we let {\mathbb T = \mathbb R/\mathbb Z} denote the “circle group”, thought of as the additive group of “real numbers modulo {1}”. There is a canonical map {e : \mathbb T \rightarrow \mathbb C} sending {\mathbb T} to the complex unit circle, given by {e(\theta) = \exp(2\pi i \theta)}.

Disclaimer: I will deliberately be sloppy with convergence issues, in part because I don’t fully understand them myself, and in part because I don’t care.

1. Synopsis

Suppose we have a domain {Z} and are interested in functions {f : Z \rightarrow \mathbb C}. Naturally, the set of such functions form a complex vector space. We like to equip the set of such functions with an positive definite inner product. The idea of Fourier analysis is to then select an orthonormal basis for this set of functions, say {(e_\xi)_{\xi}}, which we call the characters; the indexing {\xi} are called frequencies. In that case, since we have a basis, every function {f : Z \rightarrow \mathbb C} becomes a sum

\displaystyle  f(x) = \sum_{\xi} \widehat f(\xi) e_\xi

where {\widehat f(\xi)} are complex coefficients of the basis; appropriately we call {\widehat f} the Fourier coefficients. The variable {x \in Z} is referred to as the physical variable. This is generally good because the characters are deliberately chosen to be nice “symmetric” functions, like sine or cosine waves or other periodic functions. Thus {we} decompose an arbitrarily complicated function into a sum on nice ones.

For convenience, we record a few facts about orthonormal bases.

Proposition 1 (Facts about orthonormal bases)

Let {V} be a complex Hilbert space with inner form {\left< -,-\right>} and suppose {x = \sum_\xi a_\xi e_\xi} and {y = \sum_\xi b_\xi e_\xi} where {e_\xi} are an orthonormal basis. Then

\displaystyle  \begin{aligned} \left< x,x \right> &= \sum_\xi |a_\xi|^2 \\ a_\xi &= \left< x, e_\xi \right> \\ \left< x,y \right> &= \sum_\xi a_\xi \overline{b_\xi}. \end{aligned}

2. Common Examples

2.1. Binary Fourier analysis on {\{\pm1\}^n}

Let {Z = \{\pm 1\}^n} for some positive integer {n}, so we are considering functions {f(x_1, \dots, x_n)} accepting binary values. Then the functions {Z \rightarrow \mathbb C} form a {2^n}-dimensional vector space {\mathbb C^Z}, and we endow it with the inner form

\displaystyle  \left< f,g \right> = \frac{1}{2^n} \sum_{x \in Z} f(x) \overline{g(x)}.

In particular,

\displaystyle  \left< f,f \right> = \frac{1}{2^n} \sum_{x \in Z} \left\lvert f(x) \right\rvert^2

is the average of the squares; this establishes also that {\left< -,-\right>} is positive definite.

In that case, the multilinear polynomials form a basis of {\mathbb C^Z}, that is the polynomials

\displaystyle  \chi_S(x_1, \dots, x_n) = \prod_{s \in S} x_s.

Thus our frequency set is actually the subsets {S \subseteq \{1, \dots, n\}}. Thus, we have a decomposition

\displaystyle  f = \sum_{S \subseteq \{1, \dots, n\}} \widehat f(S) \chi_S.

Example 2 (An example of binary Fourier analysis)

Let {n = 2}. Then binary functions {\{ \pm 1\}^2 \rightarrow \mathbb C} have a basis given by the four polynomials

\displaystyle  1, \quad x_1, \quad x_2, \quad x_1x_2.

For example, consider the function {f} which is {1} at {(1,1)} and {0} elsewhere. Then we can put

\displaystyle  f(x_1, x_2) = \frac{x_1+1}{2} \cdot \frac{x_2+1}{2} = \frac14 \left( 1 + x_1 + x_2 + x_1x_2 \right).

So the Fourier coefficients are {\widehat f(S) = \frac 14} for each of the four {S}‘s.

This notion is useful in particular for binary functions {f : \{\pm1\}^n \rightarrow \{\pm1\}}; for these functions (and products thereof), we always have {\left< f,f \right> = 1}.

It is worth noting that the frequency {\varnothing} plays a special role:

Exercise 3

Show that

\displaystyle  \widehat f(\varnothing) = \frac{1}{|Z|} \sum_{x \in Z} f(x).

2.2. Fourier analysis on finite groups {Z}

This is the Fourier analysis used in this post and this post. Here, we have a finite abelian group {Z}, and consider functions {Z \rightarrow \mathbb C}; this is a {|Z|}-dimensional vector space. The inner product is the same as before:

\displaystyle  \left< f,g \right> = \frac{1}{|Z|} \sum_{x \in Z} f(x) \overline{g}(x).

Now here is how we generate the characters. We equip {Z} with a non-degenerate symmetric bilinear form

\displaystyle  Z \times Z \xrightarrow{\cdot} \mathbb T \qquad (\xi, x) \mapsto \xi \cdot x.

Experts may already recognize this as a choice of isomorphism between {Z} and its Pontryagin dual. This time the characters are given by

\displaystyle  \left( e_\xi \right)_{\xi \in Z} \qquad \text{where} \qquad e_\xi(x) = e(\xi \cdot x).

In this way, the set of frequencies is also {Z}, but the {\xi \in Z} play very different roles from the “physical” {x \in Z}. (It is not too hard to check these indeed form an orthonormal basis in the function space {\mathbb C^{\left\lvert Z \right\rvert}}, since we assumed that {\cdot} is non-degenerate.)

Example 4 (Cube roots of unity filter)

Suppose {Z = \mathbb Z/3\mathbb Z}, with the inner form given by {\xi \cdot x = (\xi x)/3}. Let {\omega = \exp(\frac 23 \pi i)} be a primitive cube root of unity. Note that

\displaystyle  e_\xi(x) = \begin{cases} 1 & \xi = 0 \\ \omega^x & \xi = 1 \\ \omega^{2x} & \xi = 2. \end{cases}

Then given {f : Z \rightarrow \mathbb C} with {f(0) = a}, {f(1) = b}, {f(2) = c}, we obtain

\displaystyle  f(x) = \frac{a+b+c}{3} \cdot 1 + \frac{a + \omega^2 b + \omega c}{3} \cdot \omega^x + \frac{a + \omega b + \omega^2 c}{3} \cdot \omega^{2x}.

In this way we derive that the transforms are

\displaystyle  \begin{aligned} \widehat f(0) &= \frac{a+b+c}{3} \\ \widehat f(1) &= \frac{a+\omega^2 b+ \omega c}{3} \\ \widehat f(2) &= \frac{a+\omega b+\omega^2c}{3}. \end{aligned}

Exercise 5

Show that

\displaystyle  \widehat f(0) = \frac{1}{|Z|} \sum_{x \in Z} f(x).

Olympiad contestants may recognize the previous example as a “roots of unity filter”, which is exactly the point. For concreteness, suppose one wants to compute

\displaystyle  \binom{1000}{0} + \binom{1000}{3} + \dots + \binom{1000}{999}.

In that case, we can consider the function

\displaystyle  w : \mathbb Z/3 \rightarrow \mathbb C.

such that {w(0) = 1} but {w(1) = w(2) = 0}. By abuse of notation we will also think of {w} as a function {w : \mathbb Z \twoheadrightarrow \mathbb Z/3 \rightarrow \mathbb C}. Then the sum in question is

\displaystyle  \begin{aligned} \sum_n \binom{1000}{n} w(n) &= \sum_n \binom{1000}{n} \sum_{k=0,1,2} \widehat w(k) \omega^{kn} \\ &= \sum_{k=0,1,2} \widehat w(k) \sum_n \binom{1000}{n} \omega^{kn} \\ &= \sum_{k=0,1,2} \widehat w(k) (1+\omega^k)^n. \end{aligned}

In our situation, we have {\widehat w(0) = \widehat w(1) = \widehat w(2) = \frac13}, and we have evaluated the desired sum. More generally, we can take any periodic weight {w} and use Fourier analysis in order to interchange the order of summation.

Example 6 (Binary Fourier analysis)

Suppose {Z = \{\pm 1\}^n}, viewed as an abelian group under pointwise multiplication hence isomorphic to {(\mathbb Z/2\mathbb Z)^{\oplus n}}. Assume we pick the dot product defined by

\displaystyle  \xi \cdot x = \frac{1}{2} \sum_i \xi_i x_i

where {\xi = (\xi_1, \dots, \xi_n)} and {x = (x_1, \dots, x_n)}.

We claim this coincides with the first example we gave. Indeed, let {S \subseteq \{1, \dots, n\}} and let {\xi \in \{\pm1\}^n} which is {-1} at positions in {S}, and {+1} at positions not in {S}. Then the character {\chi_S} form the previous example coincides with the character {e_\xi} in the new notation. In particular, {\widehat f(S) = \widehat f(\xi)}.

Thus Fourier analysis on a finite group {Z} subsumes binary Fourier analysis.

2.3. Fourier series for functions {L^2([-\pi, \pi])}

Now we consider the space {L^2([-\pi, \pi])} of square-integrable functions {[-\pi, \pi] \rightarrow \mathbb C}, with inner form

\displaystyle  \left< f,g \right> = \frac{1}{2\pi} \int_{[-\pi, \pi]} f(x) \overline{g(x)}.

Sadly, this is not a finite-dimensional vector space, but fortunately it is a Hilbert space so we are still fine. In this case, an orthonormal basis must allow infinite linear combinations, as long as the sum of squares is finite.

Now, it turns out in this case that

\displaystyle  (e_n)_{n \in \mathbb Z} \qquad\text{where}\qquad e_n(x) = \exp(inx)

is an orthonormal basis for {L^2([-\pi, \pi])}. Thus this time the frequency set {\mathbb Z} is infinite. So every function {f \in L^2([-\pi, \pi])} decomposes as

\displaystyle  f(x) = \sum_n \widehat f(n) \exp(inx)

for {\widehat f(n)}.

This is a little worse than our finite examples: instead of a finite sum on the right-hand side, we actually have an infinite sum. This is because our set of frequencies is now {\mathbb Z}, which isn’t finite. In this case the {\widehat f} need not be finitely supported, but do satisfy {\sum_n |\widehat f(n)|^2 < \infty}.

Since the frequency set is indexed by {\mathbb Z}, we call this a Fourier series to reflect the fact that the index is {n \in \mathbb Z}.

Exercise 7

Show once again

\displaystyle  \widehat f(0) = \frac{1}{2\pi} \int_{[-\pi, \pi]} f(x).

Often we require that the function {f} satisfies {f(-\pi) = f(\pi)}, so that {f} becomes a periodic function, and we can think of it as {f : \mathbb T \rightarrow \mathbb C}.

2.4. Summary

We summarize our various flavors of Fourier analysis in the following table.

\displaystyle  \begin{array}{llll} \hline \text{Type} & \text{Physical var} & \text{Frequency var} & \text{Basis functions} \\ \hline \textbf{Binary} & \{\pm1\}^n & \text{Subsets } S \subseteq \left\{ 1, \dots, n \right\} & \prod_{s \in S} x_s \\ \textbf{Finite group} & Z & \xi \in Z, \text{ choice of } \cdot, & e(\xi \cdot x) \\ \textbf{Fourier series} & \mathbb T \text{ or } [-\pi, \pi] & n \in \mathbb Z & \exp(inx) \\ \end{array}

In fact, we will soon see that all these examples are subsumed by Pontryagin duality for compact groups {G}.

3. Parseval and friends

The notion of an orthonormal basis makes several “big-name” results in Fourier analysis quite lucid. Basically, we can take every result from Proposition~1, translate it into the context of our Fourier analysis, and get a big-name result.

Corollary 8 (Parseval theorem)

Let {f : Z \rightarrow \mathbb C}, where {Z} is a finite abelian group. Then

\displaystyle  \sum_\xi |\widehat f(\xi)|^2 = \frac{1}{|Z|} \sum_{x \in Z} |f(x)|^2.

Similarly, if {f : [-\pi, \pi] \rightarrow \mathbb C} is square-integrable then its Fourier series satisfies

\displaystyle  \sum_n |\widehat f(n)|^2 = \frac{1}{2\pi} \int_{[-\pi, \pi]} |f(x)|^2.

Proof: Recall that {\left< f,f\right>} is equal to the square sum of the coefficients. \Box

Corollary 9 (Formulas for {\widehat f})

Let {f : Z \rightarrow \mathbb C}, where {Z} is a finite abelian group. Then

\displaystyle  \widehat f(\xi) = \frac{1}{|Z|} \sum_{x \in Z} f(x) \overline{e_\xi(x)}.

Similarly, if {f : [-\pi, \pi] \rightarrow \mathbb C} is square-integrable then its Fourier series is given by

\displaystyle  \widehat f(n) = \frac{1}{2\pi} \int_{[-\pi, \pi]} f(x) \exp(-inx).

Proof: Recall that in an orthonormal basis {(e_\xi)_\xi}, the coefficient of {e_\xi} in {f} is {\left< f, e_\xi\right>}. \Box
Note in particular what happens if we select {\xi = 0} in the above!

Corollary 10 (Plancherel theorem)

Let {f : Z \rightarrow \mathbb C}, where {Z} is a finite abelian group. Then

\displaystyle  \left< f,g \right> = \sum_{\xi \in Z} \widehat f(\xi) \overline{\widehat g(\xi)}.

Similarly, if {f : [-\pi, \pi] \rightarrow \mathbb C} is square-integrable then

\displaystyle  \left< f,g \right> = \sum_n \widehat f(\xi) \overline{\widehat g(\xi)}.

Proof: Guess! \Box

4. (Optional) Arrow’s Impossibility Theorem

As an application, we now prove a form of Arrow’s theorem. Consider {n} voters voting among {3} candidates {A}, {B}, {C}. Each voter specifies a tuple {v_i = (x_i, y_i, z_i) \in \{\pm1\}^3} as follows:

  • {x_i = 1} if {A} ranks {A} ahead of {B}, and {x_i = -1} otherwise.
  • {y_i = 1} if {A} ranks {B} ahead of {C}, and {y_i = -1} otherwise.
  • {z_i = 1} if {A} ranks {C} ahead of {A}, and {z_i = -1} otherwise.

Tacitly, we only consider {3! = 6} possibilities for {v_i}: we forbid “paradoxical” votes of the form {x_i = y_i = z_i} by assuming that people’s votes are consistent (meaning the preferences are transitive).

Then, we can consider a voting mechanism

\displaystyle  \begin{aligned} f : \{\pm1\}^n &\rightarrow \{\pm1\} \\ g : \{\pm1\}^n &\rightarrow \{\pm1\} \\ h : \{\pm1\}^n &\rightarrow \{\pm1\} \end{aligned}

such that {f(x_\bullet)} is the global preference of {A} vs. {B}, {g(y_\bullet)} is the global preference of {B} vs. {C}, and {h(z_\bullet)} is the global preference of {C} vs. {A}. We’d like to avoid situations where the global preference {(f(x_\bullet), g(y_\bullet), h(z_\bullet))} is itself paradoxical.

In fact, we will prove the following theorem:

Theorem 11 (Arrow Impossibility Theorem)

Assume that {(f,g,h)} always avoids paradoxical outcomes, and assume {\mathbf E f = \mathbf E g = \mathbf E h = 0}. Then {(f,g,h)} is either a dictatorship or anti-dictatorship: there exists a “dictator” {k} such that

\displaystyle  f(x_\bullet) = \pm x_k, \qquad g(y_\bullet) = \pm y_k, \qquad h(z_\bullet) = \pm z_k

where all three signs coincide.

The “irrelevance of independent alternatives” reflects that The assumption {\mathbf E f = \mathbf E g = \mathbf E h = 0} provides symmetry (and e.g. excludes the possibility that {f}, {g}, {h} are constant functions which ignore voter input). Unlike the usual Arrow theorem, we do not assume that {f(+1, \dots, +1) = +1} (hence possibility of anti-dictatorship).

To this end, we actually prove the following result:

Lemma 12

Assume the {n} voters vote independently at random among the {3! = 6} possibilities. The probability of a paradoxical outcome is exactly

\displaystyle  \frac14 + \frac14 \sum_{S \subseteq \{1, \dots, n\}} \left( -\frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S) \widehat g(S) + \widehat g(S) \widehat h(S) + \widehat h(S) \widehat f(S) \right) .

Proof: Define the Boolean function {D : \{\pm 1\}^3 \rightarrow \mathbb R} by

\displaystyle  D(a,b,c) = ab + bc + ca = \begin{cases} 3 & a,b,c \text{ all equal} \\ -1 & a,b,c \text{ not all equal}. \end{cases}.

Thus paradoxical outcomes arise when {D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) = 3}. Now, we compute that for randomly selected {x_\bullet}, {y_\bullet}, {z_\bullet} that

\displaystyle  \begin{aligned} \mathbf E D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) &= \mathbf E \sum_S \sum_T \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \left( \chi_S(x_\bullet)\chi_T(y_\bullet) \right) \\ &= \sum_S \sum_T \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \mathbf E\left( \chi_S(x_\bullet)\chi_T(y_\bullet) \right). \end{aligned}

Now we observe that:

  • If {S \neq T}, then {\mathbf E \chi_S(x_\bullet) \chi_T(y_\bullet) = 0}, since if say {s \in S}, {s \notin T} then {x_s} affects the parity of the product with 50% either way, and is independent of any other variables in the product.
  • On the other hand, suppose {S = T}. Then

    \displaystyle  \chi_S(x_\bullet) \chi_T(y_\bullet) = \prod_{s \in S} x_sy_s.

    Note that {x_sy_s} is equal to {1} with probability {\frac13} and {-1} with probability {\frac23} (since {(x_s, y_s, z_s)} is uniform from {3!=6} choices, which we can enumerate). From this an inductive calculation on {|S|} gives that

    \displaystyle  \prod_{s \in S} x_sy_s = \begin{cases} +1 & \text{ with probability } \frac{1}{2}(1+(-1/3)^{|S|}) \\ -1 & \text{ with probability } \frac{1}{2}(1-(-1/3)^{|S|}). \end{cases}


    \displaystyle  \mathbf E \left( \prod_{s \in S} x_sy_s \right) = \left( -\frac13 \right)^{|S|}.

Piecing this altogether, we now have that

\displaystyle  \mathbf E D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) = \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \left( -\frac13 \right)^{|S|}.

Then, we obtain that

\displaystyle  \begin{aligned} &\mathbf E \frac14 \left( 1 + D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) \right) \\ =& \frac14 + \frac14\sum_S \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \widehat f(S)^2 \left( -\frac13 \right)^{|S|}. \end{aligned}

Comparing this with the definition of {D} gives the desired result. \Box

Now for the proof of the main theorem. We see that

\displaystyle  1 = \sum_{S \subseteq \{1, \dots, n\}} -\left( -\frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S) \widehat g(S) + \widehat g(S) \widehat h(S) + \widehat h(S) \widehat f(S) \right).

But now we can just use weak inequalities. We have {\widehat f(\varnothing) = \mathbf E f = 0} and similarly for {\widehat g} and {\widehat h}, so we restrict attention to {|S| \ge 1}. We then combine the famous inequality {|ab+bc+ca| \le a^2+b^2+c^2} (which is true across all real numbers) to deduce that

\displaystyle  \begin{aligned} 1 &= \sum_{S \subseteq \{1, \dots, n\}} -\left( -\frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S) \widehat g(S) + \widehat g(S) \widehat h(S) + \widehat h(S) \widehat f(S) \right) \\ &\le \sum_{S \subseteq \{1, \dots, n\}} \left( \frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S)^2 + \widehat g(S)^2 + \widehat h(S)^2 \right) \\ &\le \sum_{S \subseteq \{1, \dots, n\}} \left( \frac13 \right)^1 \left( \widehat f(S)^2 + \widehat g(S)^2 + \widehat h(S)^2 \right) \\ &= \frac13 (1+1+1) = 1. \end{aligned}

with the last step by Parseval. So all inequalities must be sharp, and in particular {\widehat f}, {\widehat g}, {\widehat h} are supported on one-element sets, i.e. they are linear in inputs. As {f}, {g}, {h} are {\pm 1} valued, each {f}, {g}, {h} is itself either a dictator or anti-dictator function. Since {(f,g,h)} is always consistent, this implies the final result.

5. Pontryagin duality

In fact all the examples we have covered can be subsumed as special cases of Pontryagin duality, where we replace the domain with a general group {G}. In what follows, we assume {G} is a locally compact abelian (LCA) group, which just means that:

  • {G} is a abelian topological group,
  • the topology on {G} is Hausdorff, and
  • the topology on {G} is locally compact: every point of {G} has a compact neighborhood.

Notice that our previous examples fall into this category:

Example 13 (Examples of locally compact abelian groups)

  • Any finite group {Z} with the discrete topology is LCA.
  • The circle group {\mathbb T} is LCA and also in fact compact.
  • The real numbers {\mathbb R} are an example of an LCA group which is not compact.

5.1. The Pontryagin dual

The key definition is:

Definition 14

Let {G} be an LCA group. Then its Pontryagin dual is the abelian group

\displaystyle  \widehat G \overset{\mathrm{def}}{=} \left\{ \text{continuous group homomorphisms } \xi : G \rightarrow \mathbb T \right\}.

The maps {\xi} are called characters. By equipping it with the compact-open topology, we make {\widehat G} into an LCA group as well.

Example 15 (Examples of Pontryagin duals)

  • {\widehat{\mathbb Z} \cong \mathbb T}.
  • {\widehat{\mathbb T} \cong \mathbb Z}. The characters are given by {\theta \mapsto n\theta} for {n \in \mathbb Z}.
  • {\widehat{\mathbb R} \cong \mathbb R}. This is because a nonzero continuous homomorphism {\mathbb R \rightarrow S^1} is determined by the fiber above {1 \in S^1}. (Covering projections, anyone?)
  • {\widehat{\mathbb Z/n\mathbb Z} \cong \mathbb Z/n\mathbb Z}, characters {\xi} being determined by the image {\xi(1) \in \mathbb T}.
  • {\widehat{G \times H} \cong \widehat G \times \widehat H}.
  • If {Z} is a finite abelian group, then previous two examples (and structure theorem for abelian groups) imply that {\widehat{Z} \cong Z}, though not canonically. You may now recognize that the bilinear form {\cdot : Z \times Z \rightarrow Z} is exactly a choice of isomorphism {Z \rightarrow \widehat Z}.
  • For any group {G}, the dual of {\widehat G} is canonically isomorphic to {G}, id est there is a natural isomorphism

    \displaystyle  G \cong \widehat{\widehat G} \qquad \text{by} \qquad x \mapsto \left( \xi \mapsto \xi(x) \right).

    This is the Pontryagin duality theorem. (It is an analogy to the isomorphism {(V^\vee)^\vee \cong V} for vector spaces {V}.)

5.2. The orthonormal basis in the compact case

Now assume {G} is LCA but also compact, and thus has a unique Haar measure {\mu} such that {\mu(G) = 1}; this lets us integrate over {G}. Let {L^2(G)} be the space of square-integrable functions to {\mathbb C}, i.e.

\displaystyle  L^2(G) = \left\{ f : G \rightarrow \mathbb C \quad\text{such that}\quad \int_G |f|^2 \; d\mu < \infty \right\}.

Thus we can equip it with the inner form

\displaystyle  \left< f,g \right> = \int_G f\overline{g} \; d\mu.

In that case, we get all the results we wanted before:

Theorem 16 (Characters of {\widehat G} forms an orthonormal basis)

Assume {G} is LCA and compact. Then {\widehat G} is discrete, and the characters

\displaystyle  (e_\xi)_{\xi \in \widehat G} \qquad\text{by}\qquad e_\xi(x) = e(\xi(x)) = \exp(2\pi i \xi(x))

form an orthonormal basis of {L^2(G)}. Thus for each {f \in L^2(G)} we have

\displaystyle  f = \sum_{\xi \in \widehat G} \widehat f(\xi) e_\xi


\displaystyle  \widehat f(\xi) = \left< f, e_\xi \right> = \int_G f(x) \exp(-2\pi i \xi(x)) \; d\mu.

The sum {\sum_{\xi \in \widehat G}} makes sense since {\widehat G} is discrete. In particular,

  • Letting {G = Z} gives “Fourier transform on finite groups”.
  • The special case {G = \mathbb Z/n\mathbb Z} has its own Wikipedia page.
  • Letting {G = \mathbb T} gives the “Fourier series” earlier.

5.3. The Fourier transform of the non-compact case

If {G} is LCA but not compact, then Theorem~16 becomes false. On the other hand, it is still possible to define a transform, but one needs to be a little more careful. The generic example to keep in mind in what follows is {G = \mathbb R}.

In what follows, we fix a Haar measure {\mu} for {G}. (This {\mu} is no longer unique up to scaling, since {\mu(G) = \infty}.)

One considers this time the space {L^1(G)} of absolutely integrable functions. Then one directly defines the Fourier transform of {f \in L^1(G)} to be

\displaystyle  \widehat f(\xi) = \int_G f \overline{e_\xi} \; d\mu

imitating the previous definitions in the absence of an inner product. This {\widehat f} may not be {L^1}, but it is at least bounded. Then we manage to at least salvage:

Theorem 17 (Fourier inversion on {L^1(G)})

Take an LCA group {G} and fix a Haar measure {\mu} on it. One can select a unique dual measure {\widehat \mu} on {\widehat G} such that if {f \in L^1(G)}, {\widehat f \in L^1(\widehat G)}, the “Fourier inversion formula”

\displaystyle  f(x) = \int_{\widehat G} \widehat f(\xi) e_\xi(x) \; d\widehat\mu.

holds almost everywhere. It holds everywhere if {f} is continuous.

Notice the extra nuance of having to select measures, because it is no longer the case that {G} has a single distinguished measure.

Despite the fact that the {e_\xi} no longer form an orthonormal basis, the transformed function {\widehat f : \widehat G \rightarrow \mathbb C} is still often useful. In particular, they have special names for a few special {G}:

5.4. Summary

In summary,

  • Given any LCA group {G}, we can transform sufficiently nice functions on {G} into functions on {\widehat G}.
  • If {G} is compact, then we have the nicest situation possible: {L^2(G)} is an inner product space with {\left< f,g \right> = \int_G f \overline{g} \; d\mu}, and {e_\xi} form an orthonormal basis across {\widehat \xi \in \widehat G}.
  • If {G} is not compact, then we no longer get an orthonormal basis or even an inner product space, but it is still possible to define the transform

    \displaystyle  \widehat f : \widehat G \rightarrow \mathbb C

    for {f \in L^1(G)}. If {\widehat f} is also in {L^1(G)} we still get a “Fourier inversion formula” expressing {f} in terms of {\widehat f}.

We summarize our various flavors of Fourier analysis for various {G} in the following. In the first half {G} is compact, in the second half {G} is not.

\displaystyle  \begin{array}{llll} \hline \text{Name} & \text{Domain }G & \text{Dual }\widehat G & \text{Characters} \\ \hline \textbf{Binary Fourier analysis} & \{\pm1\}^n & S \subseteq \left\{ 1, \dots, n \right\} & \prod_{s \in S} x_s \\ \textbf{Fourier transform on finite groups} & Z & \xi \in \widehat Z \cong Z & e( i \xi \cdot x) \\ \textbf{Discrete Fourier transform} & \mathbb Z/n\mathbb Z & \xi \in \mathbb Z/n\mathbb Z & e(\xi x / n) \\ \textbf{Fourier series} & \mathbb T \cong [-\pi, \pi] & n \in \mathbb Z & \exp(inx) \\ \hline \textbf{Continuous Fourier transform} & \mathbb R & \xi \in \mathbb R & e(\xi x) \\ \textbf{Discrete time Fourier transform} & \mathbb Z & \xi \in \mathbb T \cong [-\pi, \pi] & \exp(i \xi n) \\ \end{array}

You might notice that the various names are awful. This is part of the reason I got confused as a high school student: every type of Fourier series above has its own Wikipedia article. If it were up to me, we would just use the term “{G}-Fourier transform”, and that would make everyone’s lives a lot easier.

6. Peter-Weyl

In fact, if {G} is a Lie group, even if {G} is not abelian we can still give an orthonormal basis of {L^2(G)} (the square-integrable functions on {G}). It turns out in this case the characters are attached to complex irreducible representations of {G} (and in what follows all representations are complex).

The result is given by the Peter-Weyl theorem. First, we need the following result:

Lemma 18 (Compact Lie groups have unitary reps)

Any finite-dimensional (complex) representation {V} of a compact Lie group {G} is unitary, meaning it can be equipped with a {G}-invariant inner form. Consequently, {V} is completely reducible: it splits into the direct sum of irreducible representations of {G}.

Proof: Suppose {B : V \times V \rightarrow \mathbb C} is any inner product. Equip {G} with a right-invariant Haar measure {dg}. Then we can equip it with an “averaged” inner form

\displaystyle  \widetilde B(v,w) = \int_G B(gv, gw) \; dg.

Then {\widetilde B} is the desired {G}-invariant inner form. Now, the fact that {V} is completely reducible follows from the fact that given a subrepresentation of {V}, its orthogonal complement is also a subrepresentation. \Box

The Peter-Weyl theorem then asserts that the finite-dimensional irreducible unitary representations essentially give an orthonormal basis for {L^2(G)}, in the following sense. Let {V = (V, \rho)} be such a representation of {G}, and fix an orthonormal basis of {e_1}, \dots, {e_d} for {V} (where {d = \dim V}). The {(i,j)}th matrix coefficient for {V} is then given by

\displaystyle  G \xrightarrow{\rho} \mathop{\mathrm{GL}}(V) \xrightarrow{\pi_{ij}} \mathbb C

where {\pi_{ij}} is the projection onto the {(i,j)}th entry of the matrix. We abbreviate {\pi_{ij} \circ \rho} to {\rho_{ij}}. Then the theorem is:

Theorem 19 (Peter-Weyl)

Let {G} be a compact Lie group. Let {\Sigma} denote the (pairwise non-isomorphic) irreducible finite-dimensional unitary representations of {G}. Then

\displaystyle  \left\{ \sqrt{\dim V} \rho_{ij} \; \Big\vert \; (V, \rho) \in \Sigma, \text{ and } 1 \le i,j \le \dim V \right\}

is an orthonormal basis of {L^2(G)}.

Strictly, I should say {\Sigma} is a set of representatives of the isomorphism classes of irreducible unitary representations, one for each isomorphism class.

In the special case {G} is abelian, all irreducible representations are one-dimensional. A one-dimensional representation of {G} is a map {G \hookrightarrow \mathop{\mathrm{GL}}(\mathbb C) \cong \mathbb C^\times}, but the unitary condition implies it is actually a map {G \hookrightarrow S^1 \cong \mathbb T}, i.e. it is an element of {\widehat G}.

18.099 Transcript: Bourgain’s Theorem

As part of the 18.099 Discrete Analysis reading group at MIT, I presented section 4.7 of Tao-Vu’s Additive Combinatorics textbook. Here were the notes I used for the second half of my presentation.

1. Synopsis

We aim to prove the following result.

Theorem 1 (Bourgain)

Assume {N \ge 2} is prime and {A, B \subseteq Z = \mathbb Z_N}. Assume that

\displaystyle  \delta \gg \sqrt{\frac{(\log \log N)^3}{\log N}}

is such that {\min\left\{ \mathbf P_ZA, \mathbf P_ZB \right\} \ge \delta}. Then {A+B} contains a proper arithmetic progression of length at least

\displaystyle  \exp\left( C\sqrt[3]{\delta^2 \log N} \right)

for some absolute constant {C > 1}.

The methods that we used with Bohr sets fail here, because in the previous half of yesterday’s lecture we took advantage of Parseval’s identity in order to handle large convolutions, always keeping two {\widehat 1_\ast} term’s inside the {\sum} sign. When we work with {A+B} this causes us to be stuck. So, we instead use the technology of {\Lambda(p)} constants and dissociated sets.

2. Previous results

As usual, let {Z} denote a finite abelian group. Recall that

Definition 2

Let {S \subseteq Z} and {2 \le p \le \infty}. The {\Lambda(p)} constant of {S}, denoted {\left\lVert S \right\rVert_{\Lambda(p)}}, is defined as

\displaystyle  \left\lVert S \right\rVert_{\Lambda(p)} = \sup_{\substack{c : S \rightarrow \mathbb C \\ c \not\equiv 0}} \frac{\left\lVert \displaystyle\sum_{\xi \in S} c(\xi) e(\xi \cdot x) \right\rVert_{L^p(Z)}} {\left\lVert c \right\rVert_{\ell^2(S)}}.

Definition 3

If {S \subseteq Z}, we say {S} is a dissociated set if all {2^{|S|}} subset sums of {S} are distinct.

For such sets we have the Rudin’s inequality (yes, Walter) which states that

Lemma 4 (Rudin’s inequality)

If {S} is dissociated then

\displaystyle  \left\lVert S \right\rVert_{\Lambda(p)} \ll \sqrt p.

Disassociated sets come up via the so-called “cube covering lemma”:

Lemma 5 (Cube covering lemma)

Let {S \subseteq Z} and {d \ge 1}. Then we can partition

\displaystyle  S = D_1 \sqcup D_2 \sqcup \dots \sqcup D_k \sqcup R

such that

  • Each {D_i} is dissociated of size {d+1},
  • There exists {\eta_1}, {\dots}, {\eta_d} such that {R} is contained in a {d}-cube, i.e. it’s covered by {c_1\eta_1 + \dots + c_d\eta_d}, where {c_i \in \{-1,0,1\}}.

Finally, we remind the reader that

Lemma 6 (Parseval)

We have

\displaystyle  \left\lVert f \right\rVert_{L^2Z} = \left\lVert \widehat f \right\rVert_{\ell^2Z}.

Since we don’t have Bohr sets anymore, the way we detect progressions is to use the pigeonhole principle. In what follows, let {T^n f} be the shift of {x} by {n}, id est {T^nf(x) = f(x-n)}.

Proposition 7 (Pigeonhole gives arithmetic progressions)

Let {f : Z \rightarrow \mathbb R_{\ge 0}}, {J \ge 1} and suppose {r \in \mathbb Z} is such that

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr}f - f \right\rvert < \mathbf E_Z f.

Then {\text{supp }(f)} contains an arithmetic of length {j} and spacing {r}.

Proof: Apply the pigeonhole principle to find an {x} such that

\displaystyle  \max_{1 \le j \le J} \left\lvert T^{jr}f(x) - f(x) \right\rvert < f(x).

Then the claim follows. \Box

3. Periodicity

Proposition 8 (Estimate for {\max_{h \in H} |T^hf|} for {\text{supp }(\widehat f)} dissociated)

Let {f : Z \rightarrow \mathbb R}, {\text{supp }(\widehat f) \subseteq S \subseteq Z} with {S} dissociated. Then for any set {H} with {|H| > 1} we have

\displaystyle  \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} \ll \sqrt{\log|H|} \left\lVert f \right\rVert_{L^2Z}.

Proof: Let {p > 2} be large and note

\displaystyle  \begin{aligned} \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} &\le \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^pZ} \\ &\le \left\lVert \left( \sum_{h \in H} \left\lvert T^h f \right\rvert^p \right)^{1/p} \right\rVert_{L^pZ} \\ &= \left( \mathbf E_Z \left( \sum_{h \in H} \left\lvert T^h f \right\rvert^p \right) \right)^{1/p} \\ &= \left( \sum_{h \in H} \mathbf E_Z \left\lvert T^h f \right\rvert^p \right)^{1/p} \\ &= \left( \sum_{h \in H} \mathbf E_Z \left\lvert f \right\rvert^p \right)^{1/p} \\ &= \left\lvert H \right\rvert^{1/p} \left\lVert \sum_\xi \widehat f(\xi) e(\xi \cdot x) \right\rVert_{L^pZ} \\ &\le \left\lvert H \right\rvert^{1/p} \left\lVert S \right\rVert_{\Lambda(p)} \left\lVert \widehat f \right\rVert_{\ell^2Z} \\ \end{aligned}

Then by Parseval and Rudin,

\displaystyle  \begin{aligned} \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} &\le \left\lvert H \right\rvert^{1/p} \left\lVert S \right\rVert_{\Lambda(p)} \left\lVert f \right\rVert_{L^2Z} \\ &\ll \left\lvert H \right\rvert^{1/p} \sqrt p \left\lVert f \right\rVert_{L^2Z}. \end{aligned}

We may then take {p \ll \log H}. \Box

We combine these two propositions into the following lemma which applies if {\widehat f} has nonzero values of “uniform” size.

Lemma 9 (Uniformity estimate for shifts)

Let {f : Z \rightarrow \mathbb R} and {J, d > 1}. Suppose that {\widehat f} is “uniform in size” across its support, in the sense that

\displaystyle  \frac {\sup_{\xi \in \text{supp }(\widehat f)} \left\lvert \widehat f(\xi) \right\rvert} {\inf_{\xi \in \text{supp }(\widehat f)} \left\lvert \widehat f(\xi) \right\rvert} \le 2016.

Then one can find {S \subseteq Z} such that {|S| = d} and for all {r \in Z},

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr}f - f \right\rvert \ll \left( \sum_\xi \left\lvert \widehat f(\xi) \right\rvert \right) \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in S} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right).

Proof: Use the cube covering lemma to put {\text{supp }(\widehat f) = D_1 \sqcup \dots \sqcup D_k \sqcup R} where {R} is contained in the cube of {S = \left\{ \eta_1, \dots, \eta_d \right\}} and {|D_i| = d+1} for {1 \le i \le k}. Accordingly, we decompose {f} over its Fourier transform as

\displaystyle  f = f_1 + \dots + f_k + g

by letting {f_i} be supported on {D_i} and {g(x)} supported on {R}.

First, we can bound the “leftover” bits in {R}:

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} g - g \right\rvert &= \mathbf E_Z \max_{0 \le j \le J} \sum_{\xi \in R} \left\lvert \widehat f(\xi) \cdot (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \mathbf E_Z \max_{0 \le j \le J} \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \left\lvert (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lvert (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lvert e(\xi \cdot jr) - 1 \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) 2\pi \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lVert \xi \cdot jr \right\rVert_{\mathbb R/\mathbb Z} \end{aligned}

Since the {\xi \in R} are covered by a cube of {S = \left\{ \eta_1, \dots, \eta_d \right\}}, we get

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} g - g \right\rvert \le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) 2\pi Jd \max_{\substack{0 \le j \le J \\ \eta \in S}} \left\lVert \eta \cdot jr \right\rVert_{\mathbb R/\mathbb Z}.

Let’s then bound the contribution over each dissociated set. We’ll need both the assumption of uniformity and the proposition we proved for dissociated sets.

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_i - f_i \right\rvert &\le 2\mathbf E_Z \max_{0 \le j \le J} \left\lvert T^{jr} f_i \right\rvert \\ &\le 2\left\lVert \max_{0 \le j \le J} \left\lvert T^{jr} f_i \right\rvert \right\rVert_{L^2Z}. \\ &\ll \sqrt{\log(J)} \left\lVert f_i \right\rVert_{L^2Z} \\ &= \sqrt{\log(J)} \sqrt{\sum_{\xi \in D_i} \left\lvert \widehat f(\xi) \right\rvert^2 } \\ &\ll \sqrt{\frac{\log J}{D}} \sum_{\xi \in D_i} \left\lvert \widehat f(\xi) \right\rvert \end{aligned}

where the last step is by uniformity of {\widehat \xi}. Now combine everything with triangle inequality. \Box

4. Proof of main theorem

Without loss of generality {\mathbf P_ZA = \mathbf P_ZB = \delta}. Of course, we let {f = 1_A \ast 1_B} so {\mathbf E_Z f = \delta^2}. We will have parameters {d \ge 1}, {M \ge 1}, and {J \ge \exp(C\sqrt[3]{\delta^2 \log N})} which we will select at the end.

Our goal is to show there exists some integer {r} such that

\displaystyle  \mathbf E_Z \max_{1 \le j < J} \left\lvert T^{jr} f - f \right\rvert < \delta^2.

Now we cannot apply the uniformity estimate directly since {f} is probably not uniform, and therefore we impose a dyadic decomposition on the base group {Z}; let

\displaystyle  \begin{aligned} Z_0 &= \left\{ \xi \in Z \;:\; \frac{1}{2} \delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \delta^2 \right\} \\ Z_1 &= \left\{ \xi \in Z \;:\; \frac14\delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \frac{1}{2}\delta^2 \right\} \\ Z_2 &= \left\{ \xi \in Z \;:\; \frac18\delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \frac14\delta^2 \right\} \\ &\vdots \\ Z_{M-1} &= \left\{ \xi \in Z \;:\; 2^{-M} \delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le 2^{-M+1} \delta^2 \right\} \\ Z_{\mathrm{err}} &= \left\{ \xi \in Z \;:\; \left\lvert \widehat f(\xi) \right\rvert < 2^{-M} \delta^2 \right\} \\ \end{aligned}

Then as before we can decompose via Fourier transform to obtain

\displaystyle  f = f_0 + f_1 + \dots + f_{M-1} + f_{\mathrm{err}}

so that {\widehat f_i} is supported on {Z_i}.

Now we can apply the previous lemma to get for each {0 \le m < M}:

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_m - f_m \right\rvert \ll \left( \sum_{\xi \in Z_m} \left\lvert \widehat f(\xi) \right\rvert \right) \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right)

for some {S_m}; hence by summing and using the fact that

\displaystyle  \sum_{\xi \in Z} \left\lvert \widehat f(\xi) \right\rvert = \sum_{\xi \in Z} \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \le \left\lVert \widehat 1_A \right\rVert_{\ell^2Z} \left\lVert \widehat 1_B \right\rVert_{\ell^2Z} = \left\lVert 1_A \right\rVert_{L^2Z} \left\lVert 1_B \right\rVert_{L^2Z} = \sqrt{\mathbf P_ZA \mathbf P_ZB} = \delta

we obtain that

\displaystyle  \sum_{0 \le m < M} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f - f \right\rvert \ll \delta \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in \bigcup S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right).

As for the “error” term, we bound

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} - f_{\mathrm{err}} \right\rvert &\le 2\mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\mathbf E_Z \sum_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\sum_{1 \le j \le J} \mathbf E_Z \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\sum_{1 \le j \le J} \mathbf E_Z \left\lvert f_{\mathrm{err}} \right\rvert \\ &\le 2J \mathbf E_Z \left\lvert f_{\mathrm{err}} \right\rvert \\ &\le 2J \left\lVert f_{\mathrm{err}} \right\rVert_{L^2Z} \\ &= 2J \left\lVert \widehat f_{\mathrm{err}} \right\rVert_{\ell^2 Z} \\ &= 2J \sqrt{\sum_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert^2} \\ &\le 2J \sqrt{\max_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert \sum_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert} \\ &\le 2J \sqrt{2^{-M}\delta^2 \cdot \delta} \\ &= 2J 2^{-M/2} \delta^{3/2} \\ &\le 2J 2^{-M/2} \delta. \end{aligned}

Thus, putting these altogether we need to find {R \neq 0} such that

\displaystyle  \sqrt{\frac{\log J}{d}} + Jd \max_{\eta\in\bigcup S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} + 2J \cdot2^{-M/2} \ll \delta.

Now set {M \asymp \log J} and {d \asymp \delta^{-2} \log J}, so the first and third terms are less than {\frac13 c \delta}, since by hypothesis

\displaystyle  \delta \gg \sqrt{\frac{(\log \log N)^3}{\log N}}

from which we deduce

\displaystyle  J \gg \exp\left( C\sqrt[3]{\delta^2\log N} \right) = \exp\left( C\log \log N \right) \ge (\log N)^C \gg \delta^{-1}.

Thus it suffices that

\displaystyle  \max_{\eta\in S} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \ll \frac{\delta^3}{J \log J}

where {S = \bigcup S_m}. Note {\left\lvert S \right\rvert \le dM \ll \left( \frac{\log J}{\delta} \right)^2}. Now we recall the result that

\displaystyle  \text{Bohr }(S, \rho) \ge |Z| \rho^{|S|}

and so it suffices for us that

\displaystyle  N \cdot \left( \frac{c_1 \delta^3}{J \log J} \right) ^{c_2 \left( \delta^{-1} \log J \right)^2} > 1

for constants {c_1} and {c_2}. Then {J = \exp(C\sqrt[3]{\delta^2 \log N})} works now.

18.099 Transcript: Chang’s Theorem

As part of the 18.099 discrete analysis reading group at MIT, I presented section 4.7 of Tao-Vu’s Additive Combinatorics textbook. Here were the notes I used for the first part of my presentation.

1. Synopsis

In the previous few lectures we’ve worked hard at developing the notion of characters, Bohr sets, spectrums. Today we put this all together to prove some Szemerédi-style results on arithmetic progressions of {\mathbb Z_N}.

Recall that Szemerédi’s Theorem states that:

Theorem 1 (Szemerédi)

Let {k \ge 3} be an integer. Then for sufficiently large {N}, any subset of {\{1, \dots, N\}} with density at least

\displaystyle  \frac{1}{(\log \log N)^{2^{-2^k+9}}}

contains a length {k} arithmetic progression.

Notice that the density approaches zero as {N \rightarrow \infty} but it does so extremely slowly.

Our goal is to show much better results for sets like {2A-2A}, {A+B+C} or {A+B}. In this post we will prove:

Theorem 2 (Chang’s Theorem)

Let {K,N \ge 1} and let {A \subseteq Z = \mathbb Z_N}. Suppose {E(A,A) \ge |A|^3 / K}, and let

\displaystyle  d \ll K\left( 1+\log \frac{1}{\mathbf P_Z A} \right).

Then there is a proper symmetric progression {P \subseteq 2A-2A} of rank at most {d} and density

\displaystyle  \mathbf P_Z P \ge d^{-d}.

One can pick {K} such that for example {|A \pm A| \le k|A|}, i.e. if {A} has small Ruzsa diameter. Or one can pick {K = 1/\mathbf P_Z A} always, but then {d} becomes quite large.

We also prove that

Theorem 3

Let {K,N \ge 1} and let {A, B, C \subseteq Z = \mathbb Z_N}. Suppose {|A|=|B|=|C| \ge \frac{1}{K}|A+B+C|} and now let

\displaystyle  d \ll K^2\left( 1+\log \frac{1}{\mathbf P_Z A} \right).

Then there is a proper symmetric progression {P \subseteq A+B+C} of rank at most {d} and

\displaystyle  \mathbf P_Z P \ge d^{-d}.

2. Main steps

Our strategy will take the following form. Let {S} be the set we want to study (for us, {S=2A-2A} or {S=A+B+C}). Then our strategy will take the following four steps.

Step 1. Analyze the Fourier coefficients of {\widehat 1_S}. Note in particular the identities

\displaystyle  \begin{aligned} \left\lVert \widehat 1_A \right\rVert_{\ell^\infty(Z)} &= \mathbf P_Z A \\ \left\lVert \widehat 1_A \right\rVert_{\ell^2(Z)} &= \sqrt{\mathbf P_Z A} \\ \left\lVert \widehat 1_A \right\rVert_{\ell^4(Z)} &= \frac{E(A,A)}{|Z|^3}. \end{aligned}

Recall also from the first section of Chapter 4 that

  • The support of {1_A \ast 1_B} is {A+B}.
  • {\widehat{f \ast g} = \widehat f \cdot \widehat g}.
  • {f(x) = \sum_\xi \widehat f(\xi) e(\xi \cdot x)}.

Step 2. Find a set of the form {\text{Bohr }(\text{Spec }_\alpha A, \rho)} contained completely inside {S}. Recall that by expanding definitions:

\displaystyle  \text{Bohr }(\text{Spec }_\alpha A, \rho) = \left\{ x \in Z \mid \sup_{\xi \; : \; \widehat 1_A(\xi) \ge \alpha \mathbf P_ZA} \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \rho \right\}.

Step 3. Use the triangle inequality and the Fourier concentration lemma (covering). Recall that this says:

Lemma 4 (Fourier Concentration, or “Covering Lemma”, Tao-Vu 4.36)

Let {A \subseteq Z}, and let {0 < \alpha \le 1}. Then one can pick {\eta_1}, \dots, {\eta_d} such that

\displaystyle  d \ll \frac{1 + \log \frac{1}{\mathbf P_ZA}}{\alpha^2}

and {\text{Spec }_\alpha A} is contained in a {d}-cube, i.e. it’s covered by {c_1\eta_1 + \dots + c_d\eta_d} where {c_i \in \{-1,0,1\}}.

Using such a {d}, we have by the triangle inequality

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{\rho}{d} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \rho \right).  \ \ \ \ \ (1)

Step 4. We use the fact that Bohr sets contain long arithmetic progressions:

Theorem 5 (Bohr sets have long coset progressions, Tao-Vu 4.23)

Let {Z = \mathbb Z_N}. Then within {\text{Bohr }(S, r)} one can select a proper symmetric progression {P} such that

\displaystyle  \mathbf P_Z P \ge \left( \frac{r}{|S|} \right)^{|S|}

and {\text{rank } P \le |S|}.

The third step is necessary because in the bound for the preceding theorem, the dependence on {|S|} is much more severe than the dependence on {r}. Therefore it is necessary to use the Fourier concentration lemma in order to reduce the size of {|S|} before applying the result.

3. Proof of Chang’s theorem

First, we do the first two steps in the following proposition.

Proposition 6

Let {A \subseteq Z}, {0 < \alpha \le 1}. Assume {E(A,A) \ge 4\alpha^2 |A|^3}, Then

\displaystyle  \text{Bohr }\left(\text{Spec }_\alpha A, \frac 16\right) \subseteq 2A-2A.

Proof: To do this, as advertised consider

\displaystyle  f = 1_A \ast 1_A \ast 1_{-A} \ast 1_{-A}(x).

We want to show that any {x \in \text{Bohr }(\text{Spec }_\alpha A, \frac 16)} lies in the support of {f}. Note that if {x} does lie in this Bohr set, we have

\displaystyle  \text{Re } e(\xi \cdot x) \ge \frac{1}{2} \qquad \forall \xi \in \text{Spec }_\alpha A.

We aim to show now {f(x) > 0}. This follows by computing

\displaystyle  \begin{aligned} f(x) &= 1_A \ast 1_A \ast 1_{-A} \ast 1_{-A}(x) \\ &= \sum_\xi \widehat 1_A(\xi)^2 \widehat 1_{-A}(\xi)^2 e(\xi \cdot x) \\ &= \sum_\xi |\widehat 1_A(\xi)|^4 e(\xi \cdot x) \end{aligned}

Now we split the sum over {\text{Spec }_\alpha A}:

\displaystyle  \begin{aligned} f(x) &= \sum_{\xi \in \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 e(\xi \cdot x) + \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 e(\xi \cdot x). \end{aligned}

Now we take the real part of both sides:

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \sum_{\xi \in \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \cdot \frac{1}{2} - \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \\ &= \frac{1}{2} \sum_{\xi} |\widehat 1_A(\xi)|^4 - \frac32 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \\ &= \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4. \end{aligned}

By definition of {\text{Spec }_\alpha A} we can bound two of the {\left\lvert \widehat 1_A(\xi) \right\rvert}‘s via

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 (\alpha\mathbf P_Z A)^2 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^2 \end{aligned}

Now the last sum is the square of the {\ell^2} norm, hence

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 (\alpha\mathbf P_Z A)^2 \cdot \mathbf P_ZA \\ &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 \alpha^2 \frac{|A|^3}{|Z|^3} > 0 \end{aligned}

by the assumption {E(A,A) \ge 4\alpha^2 |A|^3}. \Box

Now, let {\alpha = \frac{1}{2\sqrt K}}, and let

\displaystyle  d \ll \frac{1 + \log \frac{1}{\mathbf P_Z A}}{\alpha^2} \ll K\left( 1 + \log \frac{1}{\mathbf P_Z A} \right).

Then by (1), we have

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{1}{6d} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \frac16 \right) 2A-2A.

and then using the main result on Bohr sets, we can find a symmetric progression of density at least

\displaystyle  \left( \frac{1/6d}{d} \right)^d = d^{-d}

and whose rank is at most {d}. This completes the proof of Chang’s theorem.

4. Proof of the second theorem

This time, the Bohr set we want to use is:

Proposition 7

Let {\alpha = \frac{1}{2\pi K}}. Then

\displaystyle  \text{Bohr }\left(\text{Spec }_\alpha A, \frac{1}{2\pi K}\right) \subseteq A+B+C.

Proof: Let {f = 1_A \ast 1_B \ast 1_C}. Note that we have {\mathbf P_Z(A+B+C) \le K\mathbf P_Z A}, while {\mathbf E_ZA = (\mathbf P_ZA)^3}. So by shifting {C}, we may assume without loss of generality that

\displaystyle  f(0) \ge \frac{(\mathbf P_ZA)^3}{K\mathbf P_ZA} \ge \frac{1}{K} (\mathbf P_ZA)^2.

Now, consider {x} in the Bohr set. Then we have

\displaystyle  \begin{aligned} \left\lvert f(x)-f(0) \right\rvert &= \left\lvert \sum_\xi \widehat1_A(\xi) \widehat1_B(\xi) \widehat1_C(\xi) \left( e(\xi \cdot x) - 1 \right) \right\rvert \\ &\le \sum_\xi \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \left\lvert e(\xi \cdot x) - 1 \right\rvert \\ &\le 2\pi \sum_\xi \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z}. \end{aligned}

Bounding by the maximum for {A}, and then using Cauchy-Schwarz,

\displaystyle  \begin{aligned} \left\lvert f(x)-f(0) \right\rvert &\le 2\pi \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \sum_\xi \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \\ &\le 2\pi \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \sqrt{ \sum_\xi \left\lvert \widehat 1_B(\xi) \right\rvert^2 \sum_\xi \left\lvert \widehat 1_C(\xi) \right\rvert^2} \\ &\le 2\pi \mathbf P_Z A \cdot \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \end{aligned}

Claim: if {x \in \text{Bohr }(\text{Spec }_\alpha A, \frac{1}{2\pi K})} and {\xi \in Z} then

\displaystyle  \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \frac{1}{2\pi K} \mathbf P_ZA

Indeed one just considers two cases:

  • If {\xi \in \text{Spec }_\alpha A}, then {\left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \alpha} ({x} in Bohr set) and {\left\lvert \widehat1_A(\xi) \right\rvert \le \mathbf P_ZA}.
  • If {\xi \notin \text{Spec }_\alpha A}, then {\left\lvert \widehat 1_A(\xi) \right\rvert < \alpha \mathbf P_ZA} ({\xi} outside Spec) and {\left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \le 1}.

So finally, we have

\displaystyle  \left\lvert f(x)-f(0) \right\rvert < 2\pi \mathbf P_Z A \cdot \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \right) < \frac{(\mathbf P_ZA)^2}{K} \le f(0)

and this implies {f(x) \neq 0}. \Box

Once more by (1), we have

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{1}{2\pi Kd} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \frac{1}{2\pi K} \right) \subseteq A+B+C


\displaystyle  d \ll \frac{1+\log \frac{1}{\mathbf P_ZA}}{\alpha^2} \ll K^2\left( 1 + \log \frac{1}{\mathbf P_Z A} \right).

So there are the main theorem on Bohr sets again, there is a symmetric progression of density at least

\displaystyle  \left( \frac{\frac{1}{2\pi Kd}}{d} \right)^d \ll d^{-d}

and rank at most {d}. This completes the proof of the second theorem.