Cayley-Hamilton Theorem and Jordan Canonical Form

October 28th, 2009 4 comments

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an {n \times n} matrix {A} we can define its characteristic polynomial by {p_A(\lambda) = \det(A - I\lambda)}. The Cayley-Hamilton Theorem says that {p_A(A) = 0}. The polynomiale is something like:

\displaystyle p_A(x) = a_k x^k + a_{k-1} x^{k-1} + \hdots + a_1 x^1 + a_0

so we can just see it as a formal polynomial and think of:

\displaystyle p_A(A) = a_k A^k + a_{k-1} A^{k-1} + \hdots + a_1 A + a_0 I

which is an {n \times n} matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix {A} in the basis of its eigenvectors, then we can write {A = \Gamma^t D \Gamma} where {D} is the diagonal matrix with the eigenvalues in the main diagonal.

\displaystyle A^k = (\Gamma^t D \Gamma) \hdots (\Gamma^t D \Gamma) = \Gamma^t D^k \Gamma

and since {D = \text{diag}(\lambda_1, \hdots, \lambda_n)} we have {D^k = \text{diag}(\lambda_1^k, \hdots, \lambda_n^k)}. Now, it is simple to see that:

\displaystyle p_A(A) = \Gamma^t p(D) \Gamma

and therefore:

\displaystyle p(D) = \begin{bmatrix}& p_A(\lambda_1) & & & \\ & & p_A(\lambda_2) & & & \\ & & & \ddots & \\ & & & & p_A(\lambda_n) \end{bmatrix} = \begin{bmatrix} & 0 & & & \\ & & 0 & & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} = 0

And that was the one-line proof. One even simpler proof is: let {v_i} be the eigenvectors, then {p_A(A)v_i = p_A(\lambda_i)A = 0}, so {p_A(A)} must be {0} since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

\displaystyle  \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}

which has only one eigenvalue {0} and the the eigenvectors are all of the form {(x,0)} for {x \in {\mathbb R}}. So, the dimension of the space spanned by the eigenvectors is {1}, less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that {\det(A - \lambda I) = 0} and therefore there must be {v \in \ker(A - \lambda I) \setminus \{ 0 \}}, so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with {{\mathbb R}^{n^2}}. The matrices with identical eigenvalues form a zero measure subset of {{\mathbb R}^{n^2}}, they are in fact the roots of a polynomial in {x_{ij}}. This polynomial is the resultant polynomial {R_{p,p'} = 0}. Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in {{\mathbb R}^{n^2}}. Since {A \mapsto p_A(A)} is a continuous function, it extends naturally to all matrices {A \in {\mathbb R}^{n^2}}.

We can also interpret that probabilistically: get a matrix {U} where {U_{ij}} is taken uniformly at random from {[0,1]}. Then {A + \epsilon U} has with probability {1} all different eigenvalues. So, {p_{A+\epsilon U} (A+\epsilon U) = 0} with probability {1}. Now, just make {\epsilon \rightarrow 0}.

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any {A \in \overline{K}^{n^2}} there is a matrix {\Gamma \in \overline{K}^{n^2}} so that:

\displaystyle A = \Gamma^{-1} \begin{bmatrix}& B_1 & & & \\ & & B_2 & & & \\ & & & \ddots & \\ & & & & B_p \end{bmatrix}\Gamma

where {B_i} are blocks of the form:

\displaystyle B_i = \begin{bmatrix}& \lambda & 1 & & \\ & & \lambda & 1 & & \\ & & & \ddots & \\ & & & & \lambda \end{bmatrix}

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that {p_A(B_i) = 0}. If the block has size {1}, then it is exacly the proof above. If the block is bigger, then we need to look at how does {B_i^k} looks like. By inspection:

\displaystyle B_i^2 = \begin{bmatrix}& \lambda^2 & 2 \lambda & 1 & & & \\ & & \lambda^2 & 2 \lambda & 1 & & & \\ & & & & & \ddots & \\ & & & & & \lambda^2 \end{bmatrix}

Tipically, for {B_i^k} we have in each row, starting in column {k} the sequence {\lambda^k, k \lambda^{k-1}, k(k-1) \lambda^{k-1}, \hdots}, i.e., {\frac{d^0}{d\lambda^0} \lambda^k, \frac{d^1}{d\lambda^1} \lambda^k , \frac{d^2}{d\lambda^2} \lambda^k \hdots}. So, we have

\displaystyle p(B_i) = \begin{bmatrix} p(\lambda) & p'(\lambda) & p''(\lambda) & p'''(\lambda) & \hdots \\ & p(\lambda) & p'(\lambda) & p''(\lambda) & \hdots \\ 				& & p(\lambda) & p'(\lambda) & \hdots \\ 				& & & p(\lambda) & \hdots \\ 				& & & & \ddots \\ \end{bmatrix}

If block {B_i} has size {k}, then {\lambda_i} has multiplicity {k} in {p(.)} and therefore {p(\lambda_i) = p'(\lambda_i) = \hdots = p^{(k-1)}(\lambda_i)} and therefore, {p(B_i) = 0} as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

[LADIS 2009] Technical Session #4 – Monitoring and Repair

October 11th, 2009 No comments

Second Talk: A case for the accountable cloud by Andreas Haeberlen

Three cloud stories .. a threatening cloud, a promising cloud, and a nice cloud 🙂

The problem with current clouds is that the user does not know what the cloud service provider is doing with the customer’s code and data. Also, from the cloud service provider’s perspective, the operator does not know what is the code that they are running for customers supposed to do.

Alice is the customer running a service on the cloud owned and operated by Bob.

A solution: what if we had an oracle that Alice and Bob could ask about cloud problems? We want completeness, (if something is faulty, we will know) accuracy (no false positives), verifiability (the oracle can prove its diagnoses is correct).

Idea: make clud accountable to alice+bob. Cloud records its actions in a tamper-evident log, alice and bob can audit, use log to construct evidence that a fault does or does not exist.

Discussion: 1) Isn’t this too pessimistic? bob isn’t malicious ..maybe, but bob can get hacked, or things can just go wrong. 2) shouldn’t bob use fault tolerance instead? yes whenever we can, but masking faults is never perfect, we still need to check. 3) why would a provider want to deploy this? this feature will be attractive to prospective customers, and helpful for support. 4) Are these the right guarantees? completeness (no false negatives), could be relaxed with probabilistic completeness; verifiability could be relaxed only provide some evidence; accuracy (no false positives) can not be relaxed because we need to have confidence when we rule out problems.

A call to action: cloud accountability should do; deliverable provable guarantees, work for most cloud apps, require no changes to application code, cover a wide spectrum of properties, low overhead.

Work in Progress: Accountable Virtual Machines (AVM), goal: provide accountability for arbitrary unmodified software. cloud records enough data to enable deterministic replay, alice can replay log with a known-good copy of the software, can audit any part of the original execution.

Conclusion: current cloud designs carry risks for both customers and providers (mainly because of split administration problem). Proposed solution: accountable cloud. Lots of research opportunities.

Third Talk: Learning from the Past for Resolving Dilemmas of Asynchrony by Paul Ezhilchelvan

In an asynchronous model you can not bound message delivery time or even message processing time by a machine. However, in a probabilistic synchronous model, we can bound times within a certain probability via proactive measurements. The new central hypothesis of the new model is that most of the time, performance of the past is indicative of the performance of the near future (i.e. delay in the past is the indicative of delay in the future).

Design steps include doing proactive measurements, using them to establish synchrony bounds, and assign time bounds based on that, try that and see how it works and enable exceptions.

On-going work: development of exceptions (to deal with exceptional cases when mistakes are detected). Open environments are asynchronous, use crash signals for notification of extreme unexpected behavior.

Categories: Conferences, Systems Tags:

[LADIS 2009] Technical Session #5 – Communication

October 11th, 2009 No comments

First Talk: Bulletin Board: A Scalable and Robust Eventually Consistent Shared Memory over a Peer-to-Peer Overlay by Gregory Chockler

WebSphere Virtual Enterprise (WVE) is a product for managing resources in a data center. The product is a distributed system whose nodes and controllers need to communicate and share information, and BulletinBoard (BB) is used for that. BB is a platform service for facilitating group-based information sharing in a data center. It is critical component of WVE, and its primary application is monitoring and control, but the designers believe that it could be useful for other weakly consistent services.

Motivation & Contribution: Prior implementation of group communication implemented internall as not designed to grow 10 folds, and that was based on Virtual Synchronous group communication; robustness, stability, high runtime overheads as the system grew beyond several 100s of processes; static hierarchy introduced configuration problems. So the goal was to provide a new implementation to resolve the scaling and stability issues of the prior implementation (and implement this in a short time! so this constraint had important implications on the design decisions).

BB supports a write-sub (write subscribe) service model. It is a cross between pub-sub systems, shared memory systems, and traditional group communication systems. In pub-sub communication is async and done through topics. In shared memory we have overwrite semantics, singe writer per topic and process, and notifications are snapshots of state.

Consistency Semantics (single topic). PRAM Consistency: notified snapshots are consistent with the other process order of writes. A note made was that developers who built services on top of that turned out to understand this semantics of consistency.

Liveness Semantics (single topics). Uses Eventual inclusion: eventually each write by a correct and connected process is included into the notified snapshot. Eventual exclusion means that failed processes will be eventually excluded from updates.

Performance and Scalability Goals: adequate latency, scalable runtime costs, throughput is less of an issue (management load is fixed and low). Low overhead. Robustness, scalability in the presence of large number of processes and topics (2883 topics in a system of 127 processes, note that the initial target was around 1000 processes).

Approach: decided to build this on an overlay network called SON. Service Overlay Network (SON). SON is a semi-structured P2P overlay, already in the product, and self-* (recover from changes quickly without problems), resilient, and supports peer membership and broadcast. The research question here was whether if BB can be implemented efficiently on top of a P2P overlay like SON?

Architecture: SON with IAM (interest aware membership) built on top of it and BB on top of that (but BB can interact directly with SON).

Reliable Shared State Maintenance in SON for BB: is made fully decentralized, and update propagation is optimized for bimodal topic popularity. Overlay broadcast or iterative unicast over direct TCP connections if # subscribers of a topic is less than a certain threshold (and group broadcast otherwise). For Reliability, periodic refresh of the latest written value (on a long cycle) if not overwritten (this was a bad decision in retrospect) with state transfer to new/reconnected subscribers.

Experimental study on different topologies showed low cpu overhead and latency, but these numbers increased as the topology increased in size. Analysis of that revealed that this was because the periodic refreshes were stacked and caused increased CPU & latency overheads. An additional problem was in broadcast flooding, and when that was removed cpu & latency overheads stayed flat as the topology increased in size.

Lessons learned: communication cost is the major factor affecting scalability of overlay based implementations, and that anti-entropy techniques are best fit for such services.


Second Talk: Optimizing Information Flow in the Gossip Objects Platform by Ymir Vigfusson

In gossip, nodes exchange information with a random peer periodically in rounds. Gossip has appealing properties such as bounded network traffic, scalability in group size, robustness against failures, coding simplicity. This is nice when gossip is considered individually per application. In cloud computing with nodes joining many groups, the traffic is no longer bounded per node (but per topic).

The Gossip Objects (GO) platform is a general platform for running gossip for multiple applications on a single node. It bounds the gossip traffic going out of a particular node. The talk focused on how to select rumors to publish out from multiple applications on a single node such that we reduce number of messages. This is possible because rumor messages are small and have a short destination. An observation made is that rumors can be delivered indirectly, uninterested nodes can forward rumors to interested nodes.

The GO heuristic: recipient selection is biased towards higher group traffic. The content is selected by computing a utility of a rumor which is defined as the probability of that rumor will add information to a host that didn’t know that info.

Simulation, a first simulation of an extreme example with only two nodes joining many groups. The GO heuristic showed promising results. Then a real-world evaluation was conducted based on a 55 minute trace of the IBM WebSphere Virtual Enterprise Bulletin Board layer. The trace had 127 nodes and 1364 groups, and the evaluation showed that GO had placed a cap on traffic compared to random and random with stacking heuristics for GO. Additionally, the GO heuristic was able to deliver rumors faster than the other heuristic, and the number of messages needed to deliver the messages to interested nodes, and the GO heuristic had multiple orders of reduction over other heuristics and traditional rumor spreading.

Conclusion: GO implemented novel ideas such as per-node gossip, rumor stacking (pushing the rumor to the MTU size), utility based rumor dissemination, and adapting to traffic rates. GO gives per-node guarantees even when the # of groups scales up. Experimental results were compelling.

Questions:

Mike Spreitzer, IBM Research: What would happen if the number of groups increases?

Answer: Study of available real-world traces showed a pattern of overlap. We also conducted simulation with other group membership patterns and the results were similar.

—: What was the normal rumor size? And what would happen if that increased?

Answer: The average rumor size was 100Bytes. If the message size increased we will stack less rumors, but our platform can also reject really large rumors.

—: Have you thought about network-level encoding?

Answer: Not yet, but we plan to in the future.

—: Have you thought of leveraging other dissemination techniques to run under GO?

Answer: Actually, we thought about the opposite direction where we would run other communication protocols and map them under the hood to GO. Results are pending.

Categories: Conferences, Systems Tags: