Archive

Posts Tagged ‘finite fields’

Cayley-Hamilton Theorem and Jordan Canonical Form

October 28th, 2009 4 comments

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an {n \times n} matrix {A} we can define its characteristic polynomial by {p_A(\lambda) = \det(A - I\lambda)}. The Cayley-Hamilton Theorem says that {p_A(A) = 0}. The polynomiale is something like:

\displaystyle p_A(x) = a_k x^k + a_{k-1} x^{k-1} + \hdots + a_1 x^1 + a_0

so we can just see it as a formal polynomial and think of:

\displaystyle p_A(A) = a_k A^k + a_{k-1} A^{k-1} + \hdots + a_1 A + a_0 I

which is an {n \times n} matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix {A} in the basis of its eigenvectors, then we can write {A = \Gamma^t D \Gamma} where {D} is the diagonal matrix with the eigenvalues in the main diagonal.

\displaystyle A^k = (\Gamma^t D \Gamma) \hdots (\Gamma^t D \Gamma) = \Gamma^t D^k \Gamma

and since {D = \text{diag}(\lambda_1, \hdots, \lambda_n)} we have {D^k = \text{diag}(\lambda_1^k, \hdots, \lambda_n^k)}. Now, it is simple to see that:

\displaystyle p_A(A) = \Gamma^t p(D) \Gamma

and therefore:

\displaystyle p(D) = \begin{bmatrix}& p_A(\lambda_1) & & & \\ & & p_A(\lambda_2) & & & \\ & & & \ddots & \\ & & & & p_A(\lambda_n) \end{bmatrix} = \begin{bmatrix} & 0 & & & \\ & & 0 & & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} = 0

And that was the one-line proof. One even simpler proof is: let {v_i} be the eigenvectors, then {p_A(A)v_i = p_A(\lambda_i)A = 0}, so {p_A(A)} must be {0} since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

\displaystyle  \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}

which has only one eigenvalue {0} and the the eigenvectors are all of the form {(x,0)} for {x \in {\mathbb R}}. So, the dimension of the space spanned by the eigenvectors is {1}, less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that {\det(A - \lambda I) = 0} and therefore there must be {v \in \ker(A - \lambda I) \setminus \{ 0 \}}, so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with {{\mathbb R}^{n^2}}. The matrices with identical eigenvalues form a zero measure subset of {{\mathbb R}^{n^2}}, they are in fact the roots of a polynomial in {x_{ij}}. This polynomial is the resultant polynomial {R_{p,p'} = 0}. Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in {{\mathbb R}^{n^2}}. Since {A \mapsto p_A(A)} is a continuous function, it extends naturally to all matrices {A \in {\mathbb R}^{n^2}}.

We can also interpret that probabilistically: get a matrix {U} where {U_{ij}} is taken uniformly at random from {[0,1]}. Then {A + \epsilon U} has with probability {1} all different eigenvalues. So, {p_{A+\epsilon U} (A+\epsilon U) = 0} with probability {1}. Now, just make {\epsilon \rightarrow 0}.

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any {A \in \overline{K}^{n^2}} there is a matrix {\Gamma \in \overline{K}^{n^2}} so that:

\displaystyle A = \Gamma^{-1} \begin{bmatrix}& B_1 & & & \\ & & B_2 & & & \\ & & & \ddots & \\ & & & & B_p \end{bmatrix}\Gamma

where {B_i} are blocks of the form:

\displaystyle B_i = \begin{bmatrix}& \lambda & 1 & & \\ & & \lambda & 1 & & \\ & & & \ddots & \\ & & & & \lambda \end{bmatrix}

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that {p_A(B_i) = 0}. If the block has size {1}, then it is exacly the proof above. If the block is bigger, then we need to look at how does {B_i^k} looks like. By inspection:

\displaystyle B_i^2 = \begin{bmatrix}& \lambda^2 & 2 \lambda & 1 & & & \\ & & \lambda^2 & 2 \lambda & 1 & & & \\ & & & & & \ddots & \\ & & & & & \lambda^2 \end{bmatrix}

Tipically, for {B_i^k} we have in each row, starting in column {k} the sequence {\lambda^k, k \lambda^{k-1}, k(k-1) \lambda^{k-1}, \hdots}, i.e., {\frac{d^0}{d\lambda^0} \lambda^k, \frac{d^1}{d\lambda^1} \lambda^k , \frac{d^2}{d\lambda^2} \lambda^k \hdots}. So, we have

\displaystyle p(B_i) = \begin{bmatrix} p(\lambda) & p'(\lambda) & p''(\lambda) & p'''(\lambda) & \hdots \\ & p(\lambda) & p'(\lambda) & p''(\lambda) & \hdots \\ 				& & p(\lambda) & p'(\lambda) & \hdots \\ 				& & & p(\lambda) & \hdots \\ 				& & & & \ddots \\ \end{bmatrix}

If block {B_i} has size {k}, then {\lambda_i} has multiplicity {k} in {p(.)} and therefore {p(\lambda_i) = p'(\lambda_i) = \hdots = p^{(k-1)}(\lambda_i)} and therefore, {p(B_i) = 0} as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

Using expanders to prove sum-product inequalities in finite fields

September 23rd, 2009 No comments

I am happy to have the first guest post of BigRedBits written by Igor Gorodezky about an elegant and exciting result in combinatorics.

————————-

I’m fortunate enough to be spending this semester in beautiful Los Angeles as a core participant in the 2009 long program on combinatorics at IPAM (an NSF-funded math institute on UCLA’s campus). We’ve recently wrapped up the first part of the program, which consisted of tutorial lectures on various topics in combinatorics. There was an abundance of gorgeous mathematics, and with Renato’s kind permission I’d like to hijack his blog and write about what to me was one of the most memorable lectures.

This was a lecture by Jozsef Solymosi of the University of British Columbia describing some of his recent work on the sum-product problem in finite fields. In particular, he outlined a spectral-graph-theoretic proof of a recent sum-product inequality due to Garaev. Solymosi’s proof is an extremely clever and elegant application of spectral graph theory to a classical problem in combinatorial number theory, and so I thought I’d present it here. Before stating the result and giving Solymosi’s proof, let us begin with a very brief introduction to the sum-product problem.

1. Introduction

Given a finite set of real numbers {A}, define the sum set

\displaystyle  A+A = \{a+b \mid a,b \in A\}

and the product set

\displaystyle  A \cdot A = \{ab \mid a,b \in A\}.

Both the sum set and the product set clearly must have cardinality between {|A|} and {|A|^2}. Observe that if {A} is an arithmetic progression then {|A+A| \approx 2|A|} while {|A \cdot A| \approx |A|^2}, while if {A} is a geometric progression then {|A \cdot A| \approx 2|A|} while {|A + A| \approx |A|^2}. Intuition suggests that keeping {|A+A|} small by giving {A} lots of additive structure inevitably blows up {|A \cdot A|}, while keep {|A \cdot A|} small by giving {A} lots of multiplicative structure in turn blows up {|A+A|}. For an arbitrary {A}, one would expect at least one of these sets, if not both, to be fairly large.

Estimating the maximum of {|A+A|} and {|A \cdot A|} is the sum-product problem. It was posed in a paper by Erdos and Szemeredi, who proved the existence of a small constant {c} such that

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( |A|^{1+c} \right)

for any finite {A}. They conjecture that we actually have

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( |A|^{2-\delta} \right)

for any {\delta > 0} and sufficiently large {|A|}. In other words, the value of {c} in their bound can be made arbitrarily close to 1.

Much ink has been spilled in attempts to push up the value of {c}. At present, the best sum-product bound is due to Solymosi and gives {c \approx 3/11}. As an aside, I want to mention an extremely simple and elegant probabilistic proof of Elekes that gives {c=1/4}; it is detailed in Alon and Spencer’s classic text The Probabilistic Method, and is an absolute gem (look in the chapter on crossing numbers of graphs).

2. The finite field variant

Solymosi’s IPAM lecture was not, however, on this original sum-product problem, but rather on its natural finite field analogue: if {A} is a subset of {\mathbb F_p}, the finite field of prime order {p}, what can we say about the maximum of {|A+A|} and {|A \cdot A|}? Observe that it is important to consider fields whose order is prime and not the power of a prime, for in the latter case we could take {A} to be a subring and end up with the degenerate case {|A|=|A+A|=|A \cdot A|}.

Bourgain, Katz and Tao got the party started in 2004 by proving the following sum-product bound.

Theorem 1 For all {\epsilon > 0} there exists {\delta > 0} such that if {A \subseteq \mathbb F_p } with {p^{\epsilon} < |A| < p^{1+\epsilon}} then

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( |A|^{1+\delta} \right).

We note that the implied constant also depends on {\epsilon}. The best known bound is the following, due to Garaev (2007).

Theorem 2 If {A \subseteq \mathbb F_p} then

\displaystyle  \max\{ |A+A|, |A \cdot A| \} = \Omega\left( \min\left\{ |A|^{1/2}p^{1/2},|A|^2p^{-1/2} \right\} \right).

To illustrate the theorem, observe that if {|A| \approx p^{2/3}} then {\max\{ |A+A|, |A \cdot A| \} = \Omega( |A|^{5/4})}. It is this theorem that Solymosi proves using spectral graph theory (Garaev’s original proof went by way of Fourier analysis and the estimation of exponential sums).

3. Solymosi’s proof

In this section we give Solymosi’s proof of Theorem 2 (we will assume familiarity with basic facts about eigenvalues and eigenvectors of adjacency matrices). Let us first establish some notation. Consider a {d}-regular graph {G} and let the eigenvalues of its adjacency matrix be

\displaystyle  d=\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n.

As usual, we define

\displaystyle  \lambda(G) = \max\{ \lambda_2,|\lambda_n|\}.

Recall that if {G} is connected and non-bipartite then {\lambda(G)} is strictly smaller than {d}, and such a graph is an expander if {d-\lambda(G)} is bounded below by some constant.

We will make use of a fundamental result in spectral graph theory: the expander mixing lemma.

Lemma 3 Let {G} be a {d}-regular graph with {n} vertices. For all {S,T \subseteq V(G)} we have

\displaystyle  \left| e(S,T) - \frac{d}{n}|S||T| \right| \leq \lambda(G) \sqrt{|S||T|}

where {e(S,T)} is the number of edges with one endpoint in {S} and the other in {T}.

The proof of the lemma is straightforward; we omit it because of space considerations, but it can be found in the survey on expanders by Linial, Hoory, and Wigderson.

Now, back to Theorem 2: we have a finite field {\mathbb F_p} and a subset {A}. Solymosi proves the desired lower bound on {\max\{ |A+A|, |A \cdot A| \}} by constructing a sum-product graph {G_{SP}} over {\mathbb F_p} and using its spectral properties to reason about {|A+A|} and {|A \cdot A|}. So without further ado, let’s define {G_{SP}}.

The vertex set of {G_{SP}} is {\left( \mathbb F_p \setminus \{0\} \right) \times \mathbb F_p}, and two vertices {(a,b),(c,d)} have an edge between them if {ac=b+d}. It is easy to see that {G_{SP}} has {p(p-1)} vertices and is {(p-1)}-regular (some edges are loops). We also have the following key fact.

Lemma 4 Consider {(a,b),(c,d) \in V(G_{SP})}. If {a=c} or {b=d} then these two vertices have no common neighbor, otherwise they have precisely one.

Proof: This follows from the fact that the unique solution of the system

\displaystyle  \begin{array}{c} ax=b+y \\ cx=d+y \end{array}

is given by

\displaystyle  \begin{array}{c} x = (b-d)(a-c)^{-1}\\ 2y = x(a+c)-b-d \end{array}

which is only defined when {a \neq c, b \neq d}. \Box

Lemma 4 can be used to show that {G_{SP}} is in fact a very good expander (those more familiar with the literature on expanders will recognize that moreover, {G_{SP}} is almost a Ramanujan graph).

Lemma 5 {\lambda(G_{SP}) < \sqrt{3p}}.

Proof: Let {M} be the adjacency matrix of {G_{SP}}. Recall that the {(u,v)}-entry of {M^2} is the number of walks from {u} to {v} of length 2. If {u=v}, this number is the degree {p-1}, while if {u \neq v}, with {u=(a,b)} and {v=(c,d)}, Lemma 4 tells us that this number is 1 if {a\neq c} and {b \neq d} and 0 otherwise. It follows that

\displaystyle  M^2 = J+(p-2)I-E \ \ \ \ \ (1)

where {J} is the all-1 matrix, {I} is the identity matrix, and {E} is the adjacency matrix of the graph {G_E} whose vertex set is the same as {G_{SP}}, and in which two vertices {(a,b)} and {(c,d)} are connected by an edge if {a=c} or {b=d}. It is easy to see that {G_E} is a {(2p-3)}-regular graph.

Since {G_{SP}} is regular and its adjacency matrix {M} is symmetric, we know that the all-1 vector is an eigenvector of {M} and all other eigenvectors are orthogonal to it. It is easy to check that {G_{SP}} is connected and not bipartite, so that the eigenvalue {p-1} has multiplicity 1, and for any other eigenvalue {\theta} we have {|\theta| < p-1}.

Given such an eigenvalue {\theta}, let {\bf v} be a corresponding eigenvector. Then by equation~(1),

\displaystyle  \theta^2 \mathbf{v} = (p-2)\mathbf{v} - E \mathbf v

since {J\mathbf v} is the all-0 vector. Therefore {p-2-\theta^2} is an eigenvalue of {E}.

Now, the degree of {G_E} is an upper bound on the absolute value of every eigenvalue of {E}. It follows that

\displaystyle  p-2-\theta^2 \geq -2p+3

which implies {|\theta| < \sqrt{3p}}, as desired. \Box

So {G_{SP}} is an expander; very good, but what about {A}? Solymosi introduces {A} into the proof through very clever use of the expander mixing lemma: if we define {S,T \subseteq V(G_{SP})} by

\displaystyle  S=(A \cdot A) \times (-A), \ \quad T=(A^{-1}) \times (A+A)

then that lemma tells us that

\displaystyle  e(S,T) \leq \frac{|S||T|}{p} + \lambda(G_{SP}) \sqrt{|S||T|} < \frac{|A\cdot A||A+A||A|^2}{p} + \sqrt{3p|A \cdot A||A+A||A|^2}

where the second inequality used Lemma 5.

But for every {a,b,c \in A} there is an edge between {(ab,-c) \in S} and {(b^{-1},a+c) \in T}, so that {e(S,T) \geq |A|^3}. Using this observation and rearranging the resulting inequality gives

\displaystyle  \sqrt{|A \cdot A||A+A|} > \left( \frac{\sqrt{|A \cdot A||A+A|}}{p|A|}+\sqrt{3}\frac{p^{1/2}}{|A|^2} \right)^{-1}.

Now, since {(x+y)^{-1} \geq \frac{1}{2}\min\{x^{-1},y^{-1}\}} for positive {x} and {y}, we find that

\displaystyle  \sqrt{|A \cdot A||A+A|} = \Omega\left( \min\left\{ \frac{p|A|}{\sqrt{|A \cdot A||A+A|}}, |A|^2 p^{-1/2} \right\} \right)

which in turn implies

\displaystyle  \sqrt{|A \cdot A||A+A|} = \Omega\left( \min\left\{ |A|^{1/2}p^{1/2}, |A|^2 p^{-1/2} \right\} \right).

To finish the proof, we need only cite the two-term AM-GM inequality:

\displaystyle  \max\{ |A+A|, |A \cdot A| \} \geq \frac{|A \cdot A|+|A+A|}{2} \geq \sqrt{|A \cdot A||A+A|} .

4. A very terse bibliography

Solymosi’s proof is from his paper “Incidences and the spectra of graphs” (requires institutional access).

A more knowledgeable treatment of sum-product problems than I could ever provide can be found in these two entries from Terry Tao’s blog. In these, Tao provides a detailed introduction to the problem, gives the probabilistic proof of Elekes’ bound, discusses an interesting cryptographic application, and provides many references.

Igor Gorodezky