Archive

Author Archive

Cayley-Hamilton Theorem and Jordan Canonical Form

October 28th, 2009 4 comments

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an {n \times n} matrix {A} we can define its characteristic polynomial by {p_A(\lambda) = \det(A - I\lambda)}. The Cayley-Hamilton Theorem says that {p_A(A) = 0}. The polynomiale is something like:

\displaystyle p_A(x) = a_k x^k + a_{k-1} x^{k-1} + \hdots + a_1 x^1 + a_0

so we can just see it as a formal polynomial and think of:

\displaystyle p_A(A) = a_k A^k + a_{k-1} A^{k-1} + \hdots + a_1 A + a_0 I

which is an {n \times n} matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix {A} in the basis of its eigenvectors, then we can write {A = \Gamma^t D \Gamma} where {D} is the diagonal matrix with the eigenvalues in the main diagonal.

\displaystyle A^k = (\Gamma^t D \Gamma) \hdots (\Gamma^t D \Gamma) = \Gamma^t D^k \Gamma

and since {D = \text{diag}(\lambda_1, \hdots, \lambda_n)} we have {D^k = \text{diag}(\lambda_1^k, \hdots, \lambda_n^k)}. Now, it is simple to see that:

\displaystyle p_A(A) = \Gamma^t p(D) \Gamma

and therefore:

\displaystyle p(D) = \begin{bmatrix}& p_A(\lambda_1) & & & \\ & & p_A(\lambda_2) & & & \\ & & & \ddots & \\ & & & & p_A(\lambda_n) \end{bmatrix} = \begin{bmatrix} & 0 & & & \\ & & 0 & & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} = 0

And that was the one-line proof. One even simpler proof is: let {v_i} be the eigenvectors, then {p_A(A)v_i = p_A(\lambda_i)A = 0}, so {p_A(A)} must be {0} since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

\displaystyle  \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}

which has only one eigenvalue {0} and the the eigenvectors are all of the form {(x,0)} for {x \in {\mathbb R}}. So, the dimension of the space spanned by the eigenvectors is {1}, less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that {\det(A - \lambda I) = 0} and therefore there must be {v \in \ker(A - \lambda I) \setminus \{ 0 \}}, so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with {{\mathbb R}^{n^2}}. The matrices with identical eigenvalues form a zero measure subset of {{\mathbb R}^{n^2}}, they are in fact the roots of a polynomial in {x_{ij}}. This polynomial is the resultant polynomial {R_{p,p'} = 0}. Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in {{\mathbb R}^{n^2}}. Since {A \mapsto p_A(A)} is a continuous function, it extends naturally to all matrices {A \in {\mathbb R}^{n^2}}.

We can also interpret that probabilistically: get a matrix {U} where {U_{ij}} is taken uniformly at random from {[0,1]}. Then {A + \epsilon U} has with probability {1} all different eigenvalues. So, {p_{A+\epsilon U} (A+\epsilon U) = 0} with probability {1}. Now, just make {\epsilon \rightarrow 0}.

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any {A \in \overline{K}^{n^2}} there is a matrix {\Gamma \in \overline{K}^{n^2}} so that:

\displaystyle A = \Gamma^{-1} \begin{bmatrix}& B_1 & & & \\ & & B_2 & & & \\ & & & \ddots & \\ & & & & B_p \end{bmatrix}\Gamma

where {B_i} are blocks of the form:

\displaystyle B_i = \begin{bmatrix}& \lambda & 1 & & \\ & & \lambda & 1 & & \\ & & & \ddots & \\ & & & & \lambda \end{bmatrix}

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that {p_A(B_i) = 0}. If the block has size {1}, then it is exacly the proof above. If the block is bigger, then we need to look at how does {B_i^k} looks like. By inspection:

\displaystyle B_i^2 = \begin{bmatrix}& \lambda^2 & 2 \lambda & 1 & & & \\ & & \lambda^2 & 2 \lambda & 1 & & & \\ & & & & & \ddots & \\ & & & & & \lambda^2 \end{bmatrix}

Tipically, for {B_i^k} we have in each row, starting in column {k} the sequence {\lambda^k, k \lambda^{k-1}, k(k-1) \lambda^{k-1}, \hdots}, i.e., {\frac{d^0}{d\lambda^0} \lambda^k, \frac{d^1}{d\lambda^1} \lambda^k , \frac{d^2}{d\lambda^2} \lambda^k \hdots}. So, we have

\displaystyle p(B_i) = \begin{bmatrix} p(\lambda) & p'(\lambda) & p''(\lambda) & p'''(\lambda) & \hdots \\ & p(\lambda) & p'(\lambda) & p''(\lambda) & \hdots \\ 				& & p(\lambda) & p'(\lambda) & \hdots \\ 				& & & p(\lambda) & \hdots \\ 				& & & & \ddots \\ \end{bmatrix}

If block {B_i} has size {k}, then {\lambda_i} has multiplicity {k} in {p(.)} and therefore {p(\lambda_i) = p'(\lambda_i) = \hdots = p^{(k-1)}(\lambda_i)} and therefore, {p(B_i) = 0} as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

Hats, codes and puzzles

October 3rd, 2009 No comments

When I was a child someone told me the following problem:

A king promised to marry his daughter to the most intelligent man. Three princes came to claim her hand and he tryed the following logic experiment with them: The princes are gathered into a room and seated in a line, one behind the other, and are shown 2 black hats and 3 white hats. They are blindfolded, and 1 hat is placed on each of their heads, with the remaining hats hidden in a different room. The first one to deduce his hat color will marry the princess. If some prince claims his hat color incorrectly he dies.

The prince who is seated behind removes his blindfold, sees the two hats in front of him and says nothing. Then the prince in the middle removes his blindfold after that and he can see the hat of the prince in front of him. He also says nothing. Noticing the other princes said nothing, the prince seated in the first whole, without even removing his blindfold, gives the correct answer? The question is: what is the color he said?

This is a simple logical puzzle: we just write all the possibilities and start ruling them out given that the other princes didn’t answer and in the end we can find the color of his hat. I remember that this puzzle surprised me a lot as a kid. A found it extremly cool by then, what made me want to read books about logic problems. After that, I had a lot of fun reading the books by Raymond Smullyan. I usually would read the problems, think something like: there can’t ba a solution to that. Then go to school with the problem in mind and spend the day thinking about that. Here is a problem I liked a lot:

There is one prisoner and there are two doors: each has one guardian. One door leads to an exit and one door leads to death. The prisioner can choose one door to open. One guardian speaks only the truth and one guardian always lies. But you don’t know which door is which, which guardian is which and who guards each door. You are allowed to choose one guardian and make him one Yes/No question, and then you need to choose a door. What is the right question to ask?

hats1

But my goal is not to talk about logic puzzles, but about Hat problems. There are a lot of variations of the problems above: in all of them a person is allowed to see the other hats but not his own hat and we need to “guess” which is the color of our hat. If we think carefully, we will see that this is a very general kind of problem in computer science: (i) the whole goal of learning theory is to predict one thing from a lot of other things you observe; (ii) in error correcting code, we should guess one bit from all the others, or from some subset of the others; (iii) in universal truthful mechanisms, we need to make a price offer to one player that just depends on all other players bids. I’ll come back to this example in a later post, since it is what made me interested in those kinds of problems, but for now, let’s look at one puzzle I was told about by David Malec at EC’09:

There are black and white hats and {3} people: for each of them we choose one color independently at random with probability {\frac{1}{2}}. Now, they can look at each others hats but not at their own hat. Then they need to write in a piece of paper either “PASS” or one color. If all pass or if someone has a wrong color, the whole team loses (this is a team game) and if at lest one person gets the color right and no one gets wrong, the whole team wins. Create a strategy for the team to win with {\frac{3}{4}} probability.

To win with {\frac{1}{2}} probability is easy: one person will always write “BLACK” and the others “PASS”. A better strategy is the following: if one person sees two hats of equal color, he writes the opposite color, otherwise, he passes. It is easy to see the team wins except in the case where all hats are the same color, what happens with {\frac{1}{4}} probability. We would like to extend this to a more general setting:

There are black and white hats and {2^k - 1} people: for each of them we choose one color independently at random with probability {\frac{1}{2}}. Now, they can look at each others hats but not at their own hat. Then they need to write in a piece of paper either “PASS” or one color. If all pass or if someone has a wrong color, the whole team loses (this is a team game) and if at lest one person gets the color right and no one gets wrong, the whole team wins. Create a strategy for the team to win with {1-\frac{1}{2^k}} probability.

It is a tricky question on how to extend the above solution in that case. A detailed solution can be found here. The idea is quite ingenious, so I’ll sketch here. It envolves Error Correcting Code, in that case, the Hamming Code. Let {F = \{0,1\}} with sum and product modulo {2}. Let {w_1, \hdots, 2^{2^k-1}} be the non-zero vector of {F^k} and the following linear map:

\displaystyle \begin{aligned} \phi: F^{2^k-1} \rightarrow F^k \\ (a_1,\hdots, a_{2^k-1}) \mapsto \sum_i a_i w_i \end{aligned}

Let {H} be the kernel of that application. Then, it is not hard to see that {H, H+e_1, \hdots, H+e_{2^k-1}} is a partition of {F^{2^k-1}} and also that because of that fact, for each {x \in F^{2^k-1}} either {x \in H} or exists a unique {i} s.t. {x + e_i \in H}. This gives an algorithm for just one player to guess his correct color. Let {x \in F^{2^k-1}} be the color vector of the hats. Player {i} sees this vector as:

\displaystyle (x_1, \hdots, x_{i-1}, ?, x_{i+1}, \hdots, x_n)

which can be {(x_1, \hdots, x_{i-1}, 0, x_{i+1}, \hdots, x_n)} or {(x_1, \hdots, x_{i-1}, 1, x_{i+1}, \hdots, x_n)}. The strategy is: if either one of those vector is in {H}, write the color corresponding to the other vector. If both are out of {H}, pass. The team wins iff {x \notin H}, what happens with {1 - \frac{1}{2^k}} probability. Is is an easy and fun exercise to prove those facts. Or you can refer to the solution I just wrote.

hats2

Now, we can complicate it a bit more: we can add other colors and other distributions. But I wanted to move to a different problem: the paper Derandomization of Auctions showed me an impressive thing: we can use coding theory to derandomize algorithms. To illustrate their ideas, they propose the following problem:

Color guessing problem: There are {n} people wearing hats of {k} different colors. If each person can see everyone else’s hats but not his or her own. Each person needs to guess the color of his own hat. We want a deterministic guessing algorithm that {1/k} fraction of each color class is guessed correctly.

The problem is very easy if we have a source of random bits. Each person guesses some color at random. It seems very complicated to do that without random bits. Surprisingly, we will solve that using a flow computation:

Let {c = (c_1, \hdots, c_n)} be an array of colors {c_{-i}} the array with color {i} removed. Consider the following flow network: nodes {s} and {t} (source and sink), nodes {v_{c_{-i}}} for each {c_{-i}}. There are {n \cdot k^{n-1}} such nodes. Consider also nodes in the form {u_{\gamma, c})} where {\gamma} is a color ({1, \hdots, k}) and {c} is a color vector. There are {k^{n+1}} such nodes.

hats_net

We have edges from {s} to {v_{c_{-i}}} for all nodes of that kind. And we have edges from {u_{\gamma, c})} to {t}. Now, if {c = (\gamma, c_{-i})}, i.e., if {c_{-i}} completed in the {i}-th coordinate with {\gamma} generates {c}, then add an edge from {v_{c_{-i}}} to {u_{\gamma, c})}.

Consider the following flow: add {1} unit of flow from {s} to {v_{c_{-i}}} and from {v_{c_{-i}}} split that flow in pieces of size {1/k} and send each to {u_{\gamma, c}} for {c = (\gamma, c_{-i})}. Now, each node {u_{\gamma, c_{-i}}} receives {\frac{n_\gamma(c)}{\gamma}} flow, where {n_{\gamma}(c)} is the number of occurencies of {\gamma} in {c}. Send all that flow to {t}.

We can think of that flow as the guessing procedure. When we see {c_{-i}} we choose the guess independently at random and this way, each {c} receives in expectation {\frac{n_\gamma(c)}{\gamma}} guesses {\gamma}. Notice that an integral flow in that graph represents a deterministic guessing procedure: so all we need is an integral flow so that the flow from {u_{\gamma, c}} to {t} is {\lfloor n_\gamma (c) / k \rfloor }. The flow received is from nodes of the type: {v_{c_{-i}}} and that means that bidder {i} in {c}, looking at the other hats will correctly choose {c_i}, {\lfloor n_\gamma (c) / k \rfloor } times.

Now, define the capacities this way: for all edges from {s} to {v_{c_{-i}}} and from {v_{c_{-i}}} to {u_{\gamma, c}} have capacity {1} and from {u_{\gamma, c}} to {t} capacity {\lfloor n_\gamma (c) / k \rfloor }. There is an integral flow that saturates all edges from {u} to {t}, because of the fractional flow showed. So, the solution gives us a deterministic decision procedure.

In the next blog post, I’ll try to show the result in the Derandomization of Auctions that relates that to competitive auctions.

Using expanders to prove sum-product inequalities in finite fields

September 23rd, 2009 No comments

I am happy to have the first guest post of BigRedBits written by Igor Gorodezky about an elegant and exciting result in combinatorics.

————————-

I’m fortunate enough to be spending this semester in beautiful Los Angeles as a core participant in the 2009 long program on combinatorics at IPAM (an NSF-funded math institute on UCLA’s campus). We’ve recently wrapped up the first part of the program, which consisted of tutorial lectures on various topics in combinatorics. There was an abundance of gorgeous mathematics, and with Renato’s kind permission I’d like to hijack his blog and write about what to me was one of the most memorable lectures.

This was a lecture by Jozsef Solymosi of the University of British Columbia describing some of his recent work on the sum-product problem in finite fields. In particular, he outlined a spectral-graph-theoretic proof of a recent sum-product inequality due to Garaev. Solymosi’s proof is an extremely clever and elegant application of spectral graph theory to a classical problem in combinatorial number theory, and so I thought I’d present it here. Before stating the result and giving Solymosi’s proof, let us begin with a very brief introduction to the sum-product problem.

1. Introduction

Given a finite set of real numbers {A}, define the sum set

\displaystyle  A+A = \{a+b \mid a,b \in A\}

and the product set

\displaystyle  A \cdot A = \{ab \mid a,b \in A\}.

Both the sum set and the product set clearly must have cardinality between {|A|} and {|A|^2}. Observe that if {A} is an arithmetic progression then {|A+A| \approx 2|A|} while {|A \cdot A| \approx |A|^2}, while if {A} is a geometric progression then {|A \cdot A| \approx 2|A|} while {|A + A| \approx |A|^2}. Intuition suggests that keeping {|A+A|} small by giving {A} lots of additive structure inevitably blows up {|A \cdot A|}, while keep {|A \cdot A|} small by giving {A} lots of multiplicative structure in turn blows up {|A+A|}. For an arbitrary {A}, one would expect at least one of these sets, if not both, to be fairly large.

Estimating the maximum of {|A+A|} and {|A \cdot A|} is the sum-product problem. It was posed in a paper by Erdos and Szemeredi, who proved the existence of a small constant {c} such that

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( |A|^{1+c} \right)

for any finite {A}. They conjecture that we actually have

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( |A|^{2-\delta} \right)

for any {\delta > 0} and sufficiently large {|A|}. In other words, the value of {c} in their bound can be made arbitrarily close to 1.

Much ink has been spilled in attempts to push up the value of {c}. At present, the best sum-product bound is due to Solymosi and gives {c \approx 3/11}. As an aside, I want to mention an extremely simple and elegant probabilistic proof of Elekes that gives {c=1/4}; it is detailed in Alon and Spencer’s classic text The Probabilistic Method, and is an absolute gem (look in the chapter on crossing numbers of graphs).

2. The finite field variant

Solymosi’s IPAM lecture was not, however, on this original sum-product problem, but rather on its natural finite field analogue: if {A} is a subset of {\mathbb F_p}, the finite field of prime order {p}, what can we say about the maximum of {|A+A|} and {|A \cdot A|}? Observe that it is important to consider fields whose order is prime and not the power of a prime, for in the latter case we could take {A} to be a subring and end up with the degenerate case {|A|=|A+A|=|A \cdot A|}.

Bourgain, Katz and Tao got the party started in 2004 by proving the following sum-product bound.

Theorem 1 For all {\epsilon > 0} there exists {\delta > 0} such that if {A \subseteq \mathbb F_p } with {p^{\epsilon} < |A| < p^{1+\epsilon}} then

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( |A|^{1+\delta} \right).

We note that the implied constant also depends on {\epsilon}. The best known bound is the following, due to Garaev (2007).

Theorem 2 If {A \subseteq \mathbb F_p} then

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( \min\left\{ |A|^{1/2}p^{1/2},|A|^2p^{-1/2} \right\} \right).

To illustrate the theorem, observe that if {|A| \approx p^{2/3}} then {\max\{ |A+A|, |A \cdot A| \} = \Omega( |A|^{5/4})}. It is this theorem that Solymosi proves using spectral graph theory (Garaev’s original proof went by way of Fourier analysis and the estimation of exponential sums).

3. Solymosi’s proof

In this section we give Solymosi’s proof of Theorem 2 (we will assume familiarity with basic facts about eigenvalues and eigenvectors of adjacency matrices). Let us first establish some notation. Consider a {d}-regular graph {G} and let the eigenvalues of its adjacency matrix be

\displaystyle  d=\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n.

As usual, we define

\displaystyle  \lambda(G) = \max\{ \lambda_2,|\lambda_n|\}.

Recall that if {G} is connected and non-bipartite then {\lambda(G)} is strictly smaller than {d}, and such a graph is an expander if {d-\lambda(G)} is bounded below by some constant.

We will make use of a fundamental result in spectral graph theory: the expander mixing lemma.

Lemma 3 Let {G} be a {d}-regular graph with {n} vertices. For all {S,T \subseteq V(G)} we have

\displaystyle  \left| e(S,T) - \frac{d}{n}|S||T| \right| \leq \lambda(G) \sqrt{|S||T|}

where {e(S,T)} is the number of edges with one endpoint in {S} and the other in {T}.

The proof of the lemma is straightforward; we omit it because of space considerations, but it can be found in the survey on expanders by Linial, Hoory, and Wigderson.

Now, back to Theorem 2: we have a finite field {\mathbb F_p} and a subset {A}. Solymosi proves the desired lower bound on {\max\{ |A+A|, |A \cdot A| \}} by constructing a sum-product graph {G_{SP}} over {\mathbb F_p} and using its spectral properties to reason about {|A+A|} and {|A \cdot A|}. So without further ado, let’s define {G_{SP}}.

The vertex set of {G_{SP}} is {\left( \mathbb F_p \setminus \{0\} \right) \times \mathbb F_p}, and two vertices {(a,b),(c,d)} have an edge between them if {ac=b+d}. It is easy to see that {G_{SP}} has {p(p-1)} vertices and is {(p-1)}-regular (some edges are loops). We also have the following key fact.

Lemma 4 Consider {(a,b),(c,d) \in V(G_{SP})}. If {a=c} or {b=d} then these two vertices have no common neighbor, otherwise they have precisely one.

Proof: This follows from the fact that the unique solution of the system

\displaystyle  \begin{array}{c} ax=b+y \\ cx=d+y \end{array}

is given by

\displaystyle  \begin{array}{c} x = (b-d)(a-c)^{-1}\\ 2y = x(a+c)-b-d \end{array}

which is only defined when {a \neq c, b \neq d}. \Box

Lemma 4 can be used to show that {G_{SP}} is in fact a very good expander (those more familiar with the literature on expanders will recognize that moreover, {G_{SP}} is almost a Ramanujan graph).

Lemma 5 {\lambda(G_{SP}) < \sqrt{3p}}.

Proof: Let {M} be the adjacency matrix of {G_{SP}}. Recall that the {(u,v)}-entry of {M^2} is the number of walks from {u} to {v} of length 2. If {u=v}, this number is the degree {p-1}, while if {u \neq v}, with {u=(a,b)} and {v=(c,d)}, Lemma 4 tells us that this number is 1 if {a\neq c} and {b \neq d} and 0 otherwise. It follows that

\displaystyle  M^2 = J+(p-2)I-E \ \ \ \ \ (1)

where {J} is the all-1 matrix, {I} is the identity matrix, and {E} is the adjacency matrix of the graph {G_E} whose vertex set is the same as {G_{SP}}, and in which two vertices {(a,b)} and {(c,d)} are connected by an edge if {a=c} or {b=d}. It is easy to see that {G_E} is a {(2p-3)}-regular graph.

Since {G_{SP}} is regular and its adjacency matrix {M} is symmetric, we know that the all-1 vector is an eigenvector of {M} and all other eigenvectors are orthogonal to it. It is easy to check that {G_{SP}} is connected and not bipartite, so that the eigenvalue {p-1} has multiplicity 1, and for any other eigenvalue {\theta} we have {|\theta| < p-1}.

Given such an eigenvalue {\theta}, let {\bf v} be a corresponding eigenvector. Then by equation~(1),

\displaystyle  \theta^2 \mathbf{v} = (p-2)\mathbf{v} - E \mathbf v

since {J\mathbf v} is the all-0 vector. Therefore {p-2-\theta^2} is an eigenvalue of {E}.

Now, the degree of {G_E} is an upper bound on the absolute value of every eigenvalue of {E}. It follows that

\displaystyle  p-2-\theta^2 \geq -2p+3

which implies {|\theta| < \sqrt{3p}}, as desired. \Box

So {G_{SP}} is an expander; very good, but what about {A}? Solymosi introduces {A} into the proof through very clever use of the expander mixing lemma: if we define {S,T \subseteq V(G_{SP})} by

\displaystyle  S=(A \cdot A) \times (-A), \ \quad T=(A^{-1}) \times (A+A)

then that lemma tells us that

\displaystyle  e(S,T) \leq \frac{|S||T|}{p} + \lambda(G_{SP}) \sqrt{|S||T|} < \frac{|A\cdot A||A+A||A|^2}{p} + \sqrt{3p|A \cdot A||A+A||A|^2}

where the second inequality used Lemma 5.

But for every {a,b,c \in A} there is an edge between {(ab,-c) \in S} and {(b^{-1},a+c) \in T}, so that {e(S,T) \geq |A|^3}. Using this observation and rearranging the resulting inequality gives

\displaystyle  \sqrt{|A \cdot A||A+A|} > \left( \frac{\sqrt{|A \cdot A||A+A|}}{p|A|}+\sqrt{3}\frac{p^{1/2}}{|A|^2} \right)^{-1}.

Now, since {(x+y)^{-1} \geq \frac{1}{2}\min\{x^{-1},y^{-1}\}} for positive {x} and {y}, we find that

\displaystyle  \sqrt{|A \cdot A||A+A|} = \Omega\left( \min\left\{ \frac{p|A|}{\sqrt{|A \cdot A||A+A|}}, |A|^2 p^{-1/2} \right\} \right)

which in turn implies

\displaystyle  \sqrt{|A \cdot A||A+A|} = \Omega\left( \min\left\{ |A|^{1/2}p^{1/2}, |A|^2 p^{-1/2} \right\} \right).

To finish the proof, we need only cite the two-term AM-GM inequality:

\displaystyle  \max\{ |A+A|, |A \cdot A| \} \geq \frac{|A \cdot A|+|A+A|}{2} \geq \sqrt{|A \cdot A||A+A|} .

4. A very terse bibliography

Solymosi’s proof is from his paper “Incidences and the spectra of graphs” (requires institutional access).

A more knowledgeable treatment of sum-product problems than I could ever provide can be found in these two entries from Terry Tao’s blog. In these, Tao provides a detailed introduction to the problem, gives the probabilistic proof of Elekes’ bound, discusses an interesting cryptographic application, and provides many references.

Igor Gorodezky