Archive

Author Archive

Programming and Storytelling

August 21st, 2009 No comments

Recently I looked in Papadimitriou’s website looking for something else and found this great article called: “MythematiCS: In Praise of Storytelling in the Teaching of Computer Science and Math”. He begins by pointing out the in early times knowledge was transferred mostly by storytelling – and there is much more place in contemporary technical teaching to storytelling than most of people realize. He has several interesting points: one of them is that we can think of writting a computer program as telling a story. For example, the variables are the characters: they have characteristics (data types) and the whole story (program) is about playing around with them. Sometimes they have multiple faces and behaviors depending on the circumstance (polymorphism). Iteration and recursion are common literary tools, used for example in fairy tales “in the first day, this happens, in the second day, that happens, then…” or “he might be able to do that just if he does that…”. He mentions one of my favourite books: “If on a Winter’s Night a Traveler” as a great example of recursion. This made me think that maybe Italo Calvino is my favourite author because his stories are so beautifully constructed in an almost mathematical fashion – like an Escher paiting or  Bach music. They went very far in showing the beauty of math and showing it is really one art. For example, this beautiful representation of the hyperbolic plane:

Escher_Circle_Limit_III

Back to programming there are still a lot of interesting relations: several novels are multi-threaded. We look at the novels from perspectives of multiple characters. Stories also need to “compile and run”, which in this case mean, make sense and be accepted by people. I was thinking that there are a lot of books which everyone knows about but very few people have ever read (Ulisses, for example). Are those NP-complete problems?

Back to Papadimitriou’s article, he talks about a few interesting books that do a good job in mixing together math and stories. One that he mentions I read a long time ago, still when I was in high-school and it did a great job in further stimulating me on math. The book was The Parrot’s Theorem. Recently I also read one other book that he mentioned: Surreal Numbers, by Don Knuth. Although I am a great fan of almost everything Knuth writes, this book didn’t caught me much. I think it may be because I am not the right audience. If I read it a couple of years back I might have enjoyed it much more.

When I was in Greece last year, I came across this very interesting comic book: Logicomix. It was in Greek but just by looking into it I figured out it was something about math and it seemed pretty cool. Later I found out this was written by Papadimitriou and Doxiadis, which made me even more curious to read it. Now I am waiting the English translation of it. One last pointer: Doxiadis has a webpage with some interesting essays about the relations of mathematical proofs, computer programming and storytelling.

as t

MythematiCS: (1)

In Praise of Storytelling in the Teaching of
Computer Science and Math

Duality Theorem for Semidefinite Programming

August 9th, 2009 No comments

I’ve been reading through Luca Trevisan’s blog posts about relaxations of the Sparsest Cut problem. He discusses three relaxations: (i) spectral relaxation; (ii) Leighton-Rao and (iii) Arora-Rao-Vazirani. (i) and (iii) are semi-definite programs (SDP), which is an incredibly beautiful and useful generalization of Linear Programs. This technique is the core of the famous MAX-CUT algorithm from Goemans and Williamson and was also applied to design approximation algorithms for a large variety of combinatorial problems. As for Linear Programming, there is a Duality Theorem that allows us to give certificates to lower and upper bounds. I was curious to learn more about it and my friend Ashwin suggested me those great lecture notes by Lazlo Lovasz about semi-definite programming. I was surprised that the proof of the duality theorem for SDP is almost the same as for LP, just changing a few steps. We begin by using the Convex Separation Theorem to derive a Semi-definite version of Farkas’ Lemma. We use this version of Farkas’ Lemma to prove the Duality Theorem in a similar manner that is normally done for LP. Let’s go through the proof, but first, let’s define positive semidefinite matrices:

Definition 1 A matrix {A} is said to be positive semidefinite (denoted by {A \succeq 0}) if for all {x \in {\mathbb R}^n} we have {x^t A x \geq 0}.

It is the same as saying that all eigenvalues of {A} are non-negative, since the smallest eigenvalue of {A} is given by {min_{\Vert x \Vert = 1} x^t A x}. We denote {A \succ 0} when all eigenvalues of {A} are stricly positive. It is the same as {x^t A x > 0} for all {x \neq 0}. Also, given two matrices {A} and {B} we use the following notation: {A \bullet B = \sum_{ij} A_{ij} B_{ij}} what is nothing more than the dot-product when we see those matrices as vectors in {{\mathbb R}^{n^2}}. Therefore:

Definition 2 A semi-definite program (SDP) is an optimization problem where the variable is a matrix {X} and has the form:


\displaystyle  \left. \begin{aligned} & \min C \bullet X \text{ s.t. } \\ & \qquad \left. \begin{aligned} & D_i \bullet X = d_i & \forall i \\ & X \succeq 0 \\ \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (1)

that is, it is a linear program and the restriction that the variable, viewed as a matrix, must be positive semi-definite.

The interesting thing is to use that the set {\mathcal{P} \subseteq {\mathbb R}^{n^2}} of semi-definite matrices is a convex cone, i.e., if {A, B \in \mathcal{P}} then {c_1 A + c_2 B \in \mathcal{P}} for any {c_1, c_2 \geq 0}. It is easy to see this is a convex set. Now we use the following theorem to prove Farkas’ Lemma:

Theorem 3 (convex separation) Given two convex sets {C_1, C_2 \subseteq {\mathbb R}^n} such that {C_1 \cap C_2 = \emptyset} then there is {p, v \in {\mathbb R}^n} such that {v^t (x - p) \leq 0} for all {x \in C_1} and {v^t (x - p) \geq 0} for all {x \in C_2}. Besides, if one of them is a cone, it holds with {p = 0}.

Theorem 4 (Semi-definite Farkas’ Lemma) Exacly one of the following problems has a solution:

  1. {x_1 A_1 + \hdots + x_k A_k \succ 0}, {x_i \in {\mathbb R}}
  2. {A_1 \bullet Y = 0}, …, {A_k \bullet Y = 0}, {Y \succeq 0}

Proof: If problem {1} doesn’t have a solution then the cone {\mathcal{P}} is disjoint from the convex cone {\{ \sum_i x_i A_i ; x_i \in {\mathbb R} \}} and therefore they can be separated. This means that there is {Y} ({Y} plays the role of {v} in the convex separation theorem), such that: {\left( \sum_i x_i A_i \right) \bullet Y \leq 0} for all {x_1, \hdots, x_k \in {\mathbb R}} and {Y \bullet M \geq 0} for all {M \in \mathcal{P}}. Now, taking {x_i = 1} and all others {0} and then later {x_i = -1} and all other zero, we can easily prove that {A_i \bullet Y = 0}.

It remains to prove that {Y \succeq 0}. Just take {M = x x^t} for {x \in {\mathbb R}^n} and then we have that {Y \bullet M = x^t Y x \geq 0} for all {x \in {\mathbb R}^n} and therefore, {Y \succeq 0}. \Box

Now, we can use this to prove the duality theorem. We will use a slighly different version of Farkas’ Lemma together with an elementary result in Linear Algebra. The following version comes from applying the last theorem to the matrices:

\displaystyle \begin{bmatrix} A_1 & \\ & 0 \end{bmatrix}, \hdots, \begin{bmatrix} A_k & \\ & 0 \end{bmatrix}, \begin{bmatrix} -B & \\ & 1 \end{bmatrix}

instead of {A_1, \hdots, A_{k+1}}.

Theorem 5 (Semi-definite Farkas’ Lemma: Inomogeneous Version) Exacly one of the following problems has a solution:

  1. {x_1 A_1 + \hdots + x_k A_k \succ B}, {x_i \in {\mathbb R}}
  2. {A_1 \bullet Y = 0}, …, {A_k \bullet Y = 0}, {B \bullet Y \geq 0} , {Y \succeq 0}

where {A \succ B} means {A - B \succ 0}. Now, an elementary Linear Algebra result before we can proceed to the proof of the duality theorem:

Theorem 6 Given two semi-definite matrices {A} and {B} we have {tr(AB) \geq 0}, where {tr(M) = \sum_i M_{ii}}

Proof: The trace of a matrix is invariant under change of basis, i.e. for any non-singular matrix {\Gamma}, we have that {tr(A) = tr(\Gamma^{-1} A \Gamma)}. This is very easy to see, since:

\displaystyle  \begin{aligned} tr(\Gamma^{-1} A \Gamma) & = \sum_i (\Gamma^{-1} A \Gamma)_{ii} = \sum_i \sum_j (\Gamma^{-1})_{ij} (A \Gamma)_{ji} = \\ & = \sum_i \sum_j \sum_k (\Gamma^{-1})_{ij} A_{jk} (\Gamma)_{ki} = \\ & = \sum_j \sum_k A_{jk} \left( \sum_i (\Gamma)_{ki} (\Gamma^{-1})_{ij} \right) = \\ 			 & = \sum_j \sum_k A_{jk} \delta_{jk} = tr(A) \\ \end{aligned}

Now, we can write in the basis of {A}‘s eigenvectors. Let {\Gamma} be a matrix s.t. {A = \Gamma^{t} D \Gamma} where {D = diag(\lambda_1, \hdots, \lambda_n)} Now:

\displaystyle  \begin{aligned} tr(AB) & = tr(\Gamma^{t} AB \Gamma) = tr(\Gamma^{t} A \Gamma \Gamma^{t} B \Gamma) = tr(D \Gamma^{t} B \Gamma) = \\ & = \sum_{ii} \lambda_i (\Gamma^{t} B \Gamma)_{ii} \geq 0 \end{aligned}

since {\lambda_i \geq 0} because the matrix is positive semi-definite and:

\displaystyle (\Gamma^{t} B \Gamma)_{ii} = e_i^t \Gamma^{t} B \Gamma e_i = (\Gamma e_i)^t B (\Gamma e_i)

\Box

Now, we are ready for the Duality Theorem:

Theorem 7 (SDP Duality) The dual of the program 1 is given by:


\displaystyle  \left. \begin{aligned} & \max d^t y \text{ s.t. } \\ & \qquad \left. \begin{aligned} & y_1 D_1 + \hdots + y_k D_k \preceq C \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (2)

We call 1 the primal problem and 2 the dual. If {X} is feasible for 1 and {y} is feasible for 2, then {d^t y \leq C \bullet X}. Besides, if the dual has a strict feasible solution (i.e. with {\sum_i y_i D_i \prec C}) then the dual optimum is attained and both have the the same optimal value.

As we can see in the theorem above, the main difference between this duality theorem and the duality theorem of linear programming is this “stictly feasible” condition in the theorem. Let’s proceed to the proof. As in the the LP proof the outline is the following: first we prove weak duality ({d^t y \leq C \bullet X} for feasible {X,y}) and then we consider the dual restrictions with {d^t y > v^*_{dual}} and we apply Farkas’ Lemma:

Proof: Weak duality: If {X} is feasible for 1 and {y} is feasible for 2 then:

\displaystyle C\bullet X \geq \left( \sum_i y_i D_i \right) \bullet X = \sum_i y_i D_i\bullet X

because {C - \sum_i y_i D_i \succeq 0} and the {\bullet} of two positive semi-definite matrices is non-negative. Now:

\displaystyle d^t y = \sum_i y_i d_i = \sum_i y_i D_i\bullet X

by 1. Therefore, {C\bullet X \geq d^t y}

Strong duality: Clearly, the following problem has no solution:

 y_1 D_1 + \hdots + y_k D_k \prec C

y^t d > v^*_{dual}

where {v^*_{dual}} is the optimal value of 2. Now we apply the inomogeneous Farkas’ Lemma to:

\displaystyle  y_1 \begin{bmatrix} -D_1 & \\ & d_1 \end{bmatrix} + \hdots + y_k \begin{bmatrix} -D_k & \\ & d_k \end{bmatrix} \succ \begin{bmatrix} -C & \\ & v^*_{dual} \end{bmatrix}

and we get a matrix {X' = \begin{bmatrix} X & x^t \\ x & x_{n+1} \end{bmatrix}} such that:

\displaystyle  \begin{bmatrix} -D_i & \\ & d_i \end{bmatrix} \bullet X' = 0, \begin{bmatrix} -C & \\ & v^*_{dual} \end{bmatrix} \bullet X' = 0, X' \succeq 0

which means: {D_i \bullet X = d_i x_{n+1}} and {C \bullet X \leq v^*_{dual} x_{n+1}}. We know that {X \succeq 0} and {x_{n+1} \geq 0}. We just need to prove that {x_{n+1} > 0}. We already know it is not negative, so we need to prove it is not zero and here we will need the hypothesis that the dual has a strict solution. If {x_{n+1} = 0} then there would be solution to:

\displaystyle X\bullet D_i = 0, X\bullet C = 0, X \succeq 0

and by Farkas’ Lemma, the problem:

\displaystyle \sum_i y_i D_i \prec C

would have no solution and we know it does have a solution. Therefore, {x_{n+1} > 0}.

Everything else comes from combining the weak and strong version described above.

\Box

Mechanism Design

August 5th, 2009 No comments

In the early stage of algorithm design, the goal was to solve exacly and efficiently the combinatorial problems. Cook’s Theorem and the NP-completeness theory showed us that some problems are inehently hard. How can we solve this problem? By trying to find approximated solutions or by trying to look at restricted instances of those problems that are tractable. It turns out that the lack of computational power (NP-hardness in some sense) is not the only obstracle to solving problems optimally. Online algorithms propose a model where the difficulty is due to lack of information. The algorithm must take some decisions before seeing the entire input. Again, in most of the cases it is hopeless to get an optimal solution (the solution we could get if we knew the input ahead of time) and our goal is not to be far from it. Other example of natural limitation are streaming algorithms, where you should solve a certain problem with limited memory. Imagine for example one algorithm that runs in a router, that received gigabits each second. It is impossible to store all the information, and yet, we want to process this very very large input and give an anwer at some point.

An additional model is inspired in economy: there are a bunch of agents which have part of the input to the algorithm and they are interested in the solution, say, they have a certain value for each final outcome. Now, they will release their part of the input. The may lie about it to manipulate the final result according to their own interest. How to prevent that? We need to augment the algorithm with some economic incentives to make sure they don’t harm the final solution too much. We need to care about two things now: we still want a solution not to far from the optimal, but we also need to provide incentives. Such algorithm with incentives is called a mechanism and this represents one important field in Algorithmic Game Theory called Mechanism Design.

The simplest setting where this can occur is in a matching problem. Suppose there are {n} people and one item. I want to decide which person will get the item. Person {i} has a value of {v_i} of the item. Then I ask people their values and give the item to the person that has a higher value and this way I maximize the total cost of the matching. But wait, this is a not a mechanism yet. People don’t have incentives to play truthfully in this game. They may want to report higher valuations to get the item. The natural solution is to charge some payment {p} from whoever gets the item. But how much to charge? The solution is the Vickrey auction, where we charge the second highest bid to the winner. More about Vickrey auction and some other examples of truthful mechanisms can be found in the Algorithmic Game Theory book, which can be found in Tim Roughgarden’s website.

One of the most famous early papers in this are is the “Optimal Design Auction” by Myerson where he discusses auctions mechanisms that maximize the profit for the seller. After reading some papers in this area, I thought I should return and read the original paper by Myerson, and it was indeed a good idea, since the paper formalizes (or makes very explicit) a lot of central concept in this area. I wanted to blog about the two main tools in mechanism design discussed in the paper: the revelation principle and the revenue equivalence theorem.

Revelation Principle

We could imagine thousands of possible ways of providing economic incentives. It seems a difficult task to look at all possible mechanisms and choose the best one. The good thing is: we don’t need to look at all kinds of mechanisms: we can look only at truthful revelation mechanisms. Let’s formalize that: consider {n} bidders and each has value {v_i} for getting what they want (suppose they are single parameter agents, i.e., they get {v_i} if they get what they want or {0} if they don’t). We can represent the final outcome as a vector {x \in [0,1]^n}. Here we consider also randomizes mechanisms, where the final outcome can be, for example, allocate the item to bidder {i} with probability {x_i}. This vector has some restrictions imposed by the structure of the problem, i.e., if there is only one item, then we must have {\sum_i x_i \leq 1}. In the end of the mechanism, each player will pay some amount to the mechanism, say {p_i}. Then at the end, player {i} has utility: {u_i(x,p) = x_i v_i - p_i} and the total profit of the auctioneer is {\sum_i p_i}.

Let’s describe a general mechanism: in this mechanism each player has a set of strategies, say {\Theta_i}. Each player chooses one strategy {\theta_i \in \Theta_i} and the mechanism chooses one allocation and one payment function based on the strategies of the players. {x = x(\theta_1, \hdots,\theta_n)} and {p = p(\theta_1, \hdots,\theta_n)}. Notice it includes even complicated multi-round strategies: in this case, the strategy space {\Theta_i} would be a very complicated description of what a player would do for each outcome of each round.

Let {V_i} be the set of the possible valuations a player would have. So, given the mechanism, each player would pick a function {\theta_i: V_i \rightarrow \Theta_i}, i.e., {\theta_i(v_i)} is the strategy he would pick if he observed his value was {v_i}. This mechanism has an equilibrium if there is a set of such functions that are in an equilibrium, i.e., no one would change his function and be better off. If those exist, we can implement a this as a direct revelation mechanism. A direct revelation mechanism is a mechanism where the strategies are simply to reveal a value in {V_i}. So, I just ask each player his valuation.

Given a complicated mechanism {x: \Theta \rightarrow [0,1]^n}, {x: \Theta \rightarrow {\mathbb R}_+^n} and equilibrium strategies {\theta_i: V_i \rightarrow \Theta_i }, I can implement this as a direct revelation mechanism just by taking:

\displaystyle x'(v_1, \hdots, v_n) = x(\theta_1(v_1), \hdots, \theta_n(v_n))

\displaystyle p'(v_1, \hdots, v_n) = p(\theta_1(v_1), \hdots, \theta_n(v_n))

It is not hard to see that if the mechanism is {(x', v')}, for the players the best thing to do is to reveal directly their true valuation, eliminating all the complicated steps in between. One practical example of a direct revelation mechanism is the Vickrey auction. Consider the English auction, which is the famous auction that happens in most of movied depicting auctions (or Sotheby’s for a clear example): there is one item and people keep raising their bids untill everyone else drops and the last person still biddings gets the item. The clear strategy in those auctions is to raise your bid as long as the current value is below your valuation {v_i} and there still other bidders that haven’t dropped the auction. Clearly the person with highest value {v_i} will get the item. Let {v_j} be the second highest value. It is not hard to see that all but one will drop the auction after {v_j}, so the highest bidders is likely to pay {v_j + \epsilon}. This is exacly the Vickrey auction, where we just emulate the English as a direct revelation mechanism. There are of course other issues. The following quote I got from this paper:

“God help us if we ever take the theater out of the auction business or anything else. It would be an awfully boring world.” (A. Alfred Taubman, Chairman, Sotheby’s Galleries)

So, we can restrict our attention to mechanims in the form {x : V \rightarrow [0,1]^n} and {p : V \rightarrow {\mathbb R}_+^n} that are truthful, i.e., where the players have no incentives not to report their true valuation. We can characterize truthful auctions using the next theorem. Just a bit of notation before: let {v_{-i} = (v_1, \hdots, v_{i-1}, v_{i+1}, \hdots, v_n)}, {x_{-i} = (x_1, \hdots, x_{i-1}, x_{i+1}, \hdots, x_n)}, … and for {f}, let {f} be the joint probability distribution and {f_{-i}} be the joint probability distribution over {v_{-i}}:

Theorem 1 An auction is truthful if and only if, for all possible probability distributions over values given by {f_1}, …, {f_n} we have

  1. {x_i(v_i) = \int x_i(v_i, v_{-1}) f_{-i}(v_{-i}) dv_{-i}} is monotone non-decreasing
  2. {p_i(v_i) = v_i x_i(v_i) - \int_0^{v_i} x_i(z) dz} where {p_i(v_i) = \int p_i(v_i, v_{-1}) f_{-i}(v_{-i}) dv_{-i}}

Revenue Equivalence

The second main tool to reason about mechanisms concerns the revenue of the mechanism: it is Myerson’s Revnue Equivalence Principle, which roughly says that the revenue under a truthful mechanism depends only on the allocation and not on the payment function. This is somehow expected by the last theorem, since we showed that when a mechanism is truthtful, the payments are totally dependent on {(x,f)}.

The profit of the auctioneer is given by {\int \sum_{i=0}^n p_i(v) f(v) dv}. We can substitute {p_i(v)} by the payment formula in last theorem obtaining:

\displaystyle \begin{aligned} \text{profit} &= \sum_{i=0}^n \int_{V_i} p_i(v_i) f_i(v_i) dv_i = \\&= \sum_{i=0}^n \int_{V_i} \left( v_i x_i(v_i) - \int_0^{v_i} x_i(z) dz \right) f_i(v_i) dv_i \end{aligned}

We can invert the order of the integration in the second part, getting:

\displaystyle \begin{aligned} \int_{V_i} \int_0^{v_i} x_i(z) f_i(v_i) dz dv_i &= \int_{V_i} \int_{v_i}^{\infty} x_i(z) f_i(v_i) dv_i dz = \\ &= \int_{V_i} x_i(z) (1 - F_i(z)) dz \\ \end{aligned}

So, we can rewrite profit as:

\displaystyle  \text{profit} = \sum_{i=0}^n \int_V \left[ 1 - \frac{1 - F_i(v_i)}{f_i(v_i)} \right] x_i(v_i) f(v) dv

And that proves the following result:

Theorem 2 The seller’s expected utility from a truthful direct revelation mechanism depends only on the assignment function {x(v)}.

Now, to implement a revenue-maximizing mechanism we just need to find {x(v)} that optimize the profit functional above still meeting the truthfulness constraints in the first theorem. This is discussed by Myerson in his paper. There is just one issue here: our analysis is dependent on the probabilities {f_i}. There are various approaches to that:

  • Assume that the values of the bidders are drawn from distribution and {f_i} and given to them. The distributions are public knowledge but the realization of {f_i} is just known to bidder {i}.
  • Bidders have fixed values {v_i} (i.e., {f_i} are the Dirac distribution concentrated on {v_i}) and in this case, the revenue maximizing problem becomes trivial. It is still an interesting problem in the point of view of truthfulness. But in this case, we should assume that {v_i} is fixed but just player {i} knows its value.
  • The distributions exist but they are unknown by the mechanism designer. In this case, he wants to design a mechanism that provided good profit guaranteed against all possible distributions. The profit guarantees need to be established accourding to some benchmark. This is called prior-free mechanism design.

More references about Mechanism Design can be found in these lectures by Jason Harline, in the original Myerson paper or in the Algorithmic Game Theory book.