Archive

Posts Tagged ‘theory’

Mechanism Design

August 5th, 2009 No comments

In the early stage of algorithm design, the goal was to solve exacly and efficiently the combinatorial problems. Cook’s Theorem and the NP-completeness theory showed us that some problems are inehently hard. How can we solve this problem? By trying to find approximated solutions or by trying to look at restricted instances of those problems that are tractable. It turns out that the lack of computational power (NP-hardness in some sense) is not the only obstracle to solving problems optimally. Online algorithms propose a model where the difficulty is due to lack of information. The algorithm must take some decisions before seeing the entire input. Again, in most of the cases it is hopeless to get an optimal solution (the solution we could get if we knew the input ahead of time) and our goal is not to be far from it. Other example of natural limitation are streaming algorithms, where you should solve a certain problem with limited memory. Imagine for example one algorithm that runs in a router, that received gigabits each second. It is impossible to store all the information, and yet, we want to process this very very large input and give an anwer at some point.

An additional model is inspired in economy: there are a bunch of agents which have part of the input to the algorithm and they are interested in the solution, say, they have a certain value for each final outcome. Now, they will release their part of the input. The may lie about it to manipulate the final result according to their own interest. How to prevent that? We need to augment the algorithm with some economic incentives to make sure they don’t harm the final solution too much. We need to care about two things now: we still want a solution not to far from the optimal, but we also need to provide incentives. Such algorithm with incentives is called a mechanism and this represents one important field in Algorithmic Game Theory called Mechanism Design.

The simplest setting where this can occur is in a matching problem. Suppose there are {n} people and one item. I want to decide which person will get the item. Person {i} has a value of {v_i} of the item. Then I ask people their values and give the item to the person that has a higher value and this way I maximize the total cost of the matching. But wait, this is a not a mechanism yet. People don’t have incentives to play truthfully in this game. They may want to report higher valuations to get the item. The natural solution is to charge some payment {p} from whoever gets the item. But how much to charge? The solution is the Vickrey auction, where we charge the second highest bid to the winner. More about Vickrey auction and some other examples of truthful mechanisms can be found in the Algorithmic Game Theory book, which can be found in Tim Roughgarden’s website.

One of the most famous early papers in this are is the “Optimal Design Auction” by Myerson where he discusses auctions mechanisms that maximize the profit for the seller. After reading some papers in this area, I thought I should return and read the original paper by Myerson, and it was indeed a good idea, since the paper formalizes (or makes very explicit) a lot of central concept in this area. I wanted to blog about the two main tools in mechanism design discussed in the paper: the revelation principle and the revenue equivalence theorem.

Revelation Principle

We could imagine thousands of possible ways of providing economic incentives. It seems a difficult task to look at all possible mechanisms and choose the best one. The good thing is: we don’t need to look at all kinds of mechanisms: we can look only at truthful revelation mechanisms. Let’s formalize that: consider {n} bidders and each has value {v_i} for getting what they want (suppose they are single parameter agents, i.e., they get {v_i} if they get what they want or {0} if they don’t). We can represent the final outcome as a vector {x \in [0,1]^n}. Here we consider also randomizes mechanisms, where the final outcome can be, for example, allocate the item to bidder {i} with probability {x_i}. This vector has some restrictions imposed by the structure of the problem, i.e., if there is only one item, then we must have {\sum_i x_i \leq 1}. In the end of the mechanism, each player will pay some amount to the mechanism, say {p_i}. Then at the end, player {i} has utility: {u_i(x,p) = x_i v_i - p_i} and the total profit of the auctioneer is {\sum_i p_i}.

Let’s describe a general mechanism: in this mechanism each player has a set of strategies, say {\Theta_i}. Each player chooses one strategy {\theta_i \in \Theta_i} and the mechanism chooses one allocation and one payment function based on the strategies of the players. {x = x(\theta_1, \hdots,\theta_n)} and {p = p(\theta_1, \hdots,\theta_n)}. Notice it includes even complicated multi-round strategies: in this case, the strategy space {\Theta_i} would be a very complicated description of what a player would do for each outcome of each round.

Let {V_i} be the set of the possible valuations a player would have. So, given the mechanism, each player would pick a function {\theta_i: V_i \rightarrow \Theta_i}, i.e., {\theta_i(v_i)} is the strategy he would pick if he observed his value was {v_i}. This mechanism has an equilibrium if there is a set of such functions that are in an equilibrium, i.e., no one would change his function and be better off. If those exist, we can implement a this as a direct revelation mechanism. A direct revelation mechanism is a mechanism where the strategies are simply to reveal a value in {V_i}. So, I just ask each player his valuation.

Given a complicated mechanism {x: \Theta \rightarrow [0,1]^n}, {x: \Theta \rightarrow {\mathbb R}_+^n} and equilibrium strategies {\theta_i: V_i \rightarrow \Theta_i }, I can implement this as a direct revelation mechanism just by taking:

\displaystyle x'(v_1, \hdots, v_n) = x(\theta_1(v_1), \hdots, \theta_n(v_n))

\displaystyle p'(v_1, \hdots, v_n) = p(\theta_1(v_1), \hdots, \theta_n(v_n))

It is not hard to see that if the mechanism is {(x', v')}, for the players the best thing to do is to reveal directly their true valuation, eliminating all the complicated steps in between. One practical example of a direct revelation mechanism is the Vickrey auction. Consider the English auction, which is the famous auction that happens in most of movied depicting auctions (or Sotheby’s for a clear example): there is one item and people keep raising their bids untill everyone else drops and the last person still biddings gets the item. The clear strategy in those auctions is to raise your bid as long as the current value is below your valuation {v_i} and there still other bidders that haven’t dropped the auction. Clearly the person with highest value {v_i} will get the item. Let {v_j} be the second highest value. It is not hard to see that all but one will drop the auction after {v_j}, so the highest bidders is likely to pay {v_j + \epsilon}. This is exacly the Vickrey auction, where we just emulate the English as a direct revelation mechanism. There are of course other issues. The following quote I got from this paper:

“God help us if we ever take the theater out of the auction business or anything else. It would be an awfully boring world.” (A. Alfred Taubman, Chairman, Sotheby’s Galleries)

So, we can restrict our attention to mechanims in the form {x : V \rightarrow [0,1]^n} and {p : V \rightarrow {\mathbb R}_+^n} that are truthful, i.e., where the players have no incentives not to report their true valuation. We can characterize truthful auctions using the next theorem. Just a bit of notation before: let {v_{-i} = (v_1, \hdots, v_{i-1}, v_{i+1}, \hdots, v_n)}, {x_{-i} = (x_1, \hdots, x_{i-1}, x_{i+1}, \hdots, x_n)}, … and for {f}, let {f} be the joint probability distribution and {f_{-i}} be the joint probability distribution over {v_{-i}}:

Theorem 1 An auction is truthful if and only if, for all possible probability distributions over values given by {f_1}, …, {f_n} we have

  1. {x_i(v_i) = \int x_i(v_i, v_{-1}) f_{-i}(v_{-i}) dv_{-i}} is monotone non-decreasing
  2. {p_i(v_i) = v_i x_i(v_i) - \int_0^{v_i} x_i(z) dz} where {p_i(v_i) = \int p_i(v_i, v_{-1}) f_{-i}(v_{-i}) dv_{-i}}

Revenue Equivalence

The second main tool to reason about mechanisms concerns the revenue of the mechanism: it is Myerson’s Revnue Equivalence Principle, which roughly says that the revenue under a truthful mechanism depends only on the allocation and not on the payment function. This is somehow expected by the last theorem, since we showed that when a mechanism is truthtful, the payments are totally dependent on {(x,f)}.

The profit of the auctioneer is given by {\int \sum_{i=0}^n p_i(v) f(v) dv}. We can substitute {p_i(v)} by the payment formula in last theorem obtaining:

\displaystyle \begin{aligned} \text{profit} &= \sum_{i=0}^n \int_{V_i} p_i(v_i) f_i(v_i) dv_i = \\&= \sum_{i=0}^n \int_{V_i} \left( v_i x_i(v_i) - \int_0^{v_i} x_i(z) dz \right) f_i(v_i) dv_i \end{aligned}

We can invert the order of the integration in the second part, getting:

\displaystyle \begin{aligned} \int_{V_i} \int_0^{v_i} x_i(z) f_i(v_i) dz dv_i &= \int_{V_i} \int_{v_i}^{\infty} x_i(z) f_i(v_i) dv_i dz = \\ &= \int_{V_i} x_i(z) (1 - F_i(z)) dz \\ \end{aligned}

So, we can rewrite profit as:

\displaystyle  \text{profit} = \sum_{i=0}^n \int_V \left[ 1 - \frac{1 - F_i(v_i)}{f_i(v_i)} \right] x_i(v_i) f(v) dv

And that proves the following result:

Theorem 2 The seller’s expected utility from a truthful direct revelation mechanism depends only on the assignment function {x(v)}.

Now, to implement a revenue-maximizing mechanism we just need to find {x(v)} that optimize the profit functional above still meeting the truthfulness constraints in the first theorem. This is discussed by Myerson in his paper. There is just one issue here: our analysis is dependent on the probabilities {f_i}. There are various approaches to that:

  • Assume that the values of the bidders are drawn from distribution and {f_i} and given to them. The distributions are public knowledge but the realization of {f_i} is just known to bidder {i}.
  • Bidders have fixed values {v_i} (i.e., {f_i} are the Dirac distribution concentrated on {v_i}) and in this case, the revenue maximizing problem becomes trivial. It is still an interesting problem in the point of view of truthfulness. But in this case, we should assume that {v_i} is fixed but just player {i} knows its value.
  • The distributions exist but they are unknown by the mechanism designer. In this case, he wants to design a mechanism that provided good profit guaranteed against all possible distributions. The profit guarantees need to be established accourding to some benchmark. This is called prior-free mechanism design.

More references about Mechanism Design can be found in these lectures by Jason Harline, in the original Myerson paper or in the Algorithmic Game Theory book.

Consistent Labeling

August 3rd, 2009 No comments

This week, me and Igor are presenting the paper “Metric clustering via consistent labeling” by Krauthgamer and Roughgarden in our Algorithms Reading Group. To prepare for the presentation, I thought that writting a blog post about it was a nice idea. Consistent Labeling is a framework that allows us to represent a variety of metric problems, as computing a separating decomposition of a metric space, a padded decomposition (both which are the main ingredient of embedding metric spaces into dominating trees), sparse cover, metric triangulation and so on. Here, I’ll define the Consistent Labeling Problem, formulate it as an Integer Program and show how we can get an approximation algorithm to it by rounding its linear relaxation.

First, consider a base set {A}. We want to attribute labels in {L} to the elements of {A} respecting some constraints. First, each element {a \in A} has associated with it a subset {L_a \subseteq L} of labels it can be assigned. Each element can receive at most {k} labels. Second, there is a collection {\mathcal{C}} of subsets of {A}, so that for each {S \in \mathcal{C}}, there must be one label {i \in L} that is assigned to all elements in {S}. For each element in {S \in \mathcal{C}}, there is a penalty {w_S} to violate this constraint.

Our goal is to find a probability distribution over labelings that minimizes the total penalty {\sum_{S \in \mathcal{C}} w_S Pr[S \text{ is violated}]}. Let’s formulate this is an integer program. For that, the first thing we need are decision variables: let {x_{ai}} be a {{0,1}}-variable indicating if label {i \in L} is assigned to element {a \in A}. Let the variable {y_{iS}} mean that the label {i} was assigned to all elements in {S} and let {z_S} mean that set {S} is consistently labeled. A linear programming formulation can be therefore expressed as:

\displaystyle  \left. \begin{aligned} & \min \sum_{S \in \mathcal{C}} w_S z_S \text{ s.t. } \\ & \qquad \left\lbrace \begin{aligned} & \sum_{i \in L} x_{ia} \leq k & \forall a \in A \\ & y_{iS} \leq x_{ia} & \forall S \in \mathcal{C}, a \in S, i \in L\\ & z_S \leq \sum_{i \in L} y_{iS} & \forall S \in \mathcal{C} \\ & z_S \leq 1 & \forall S \in \mathcal{C} \\ & x_{ia} = 0 & \forall i \notin L_a \\ \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (1)

It is not hard to see that if {x,y,x} are {\{0,1\}}-variables then the formulation corresponds to the original problem. Now, let’s relax it to a Linear Program and interpret those as probabilities. The rounding procedure we use is a generalization of the one in “Approximation algorithms for classification problems” of Kleinberg and Tardos: until every object has {k} labels, repeat the following procedure: pick {i \sim \text{Uniform}(L)} and {t \sim \text{Uniform}([0,1])} and assign label {i} to all objects with {x_{ai} > t}. In the end, pick the {k} first labels assigned to each object.

What we just described is a procedure that, given a solution to the relaxed LP, produces a randomized labeling of the objects. Now, we need to prove that the solution produced by this randomized labeling is good in expectation, that is, that {\sum_{S \in \mathcal{C}} w_S Pr[S \text{ is violated}]} is good compared to the optimal single deterministic assignment. The authors prove that they are within a factor of {2 f_{max}}, where {f_{max} = \max_{S \in \mathcal{C}} \vert S \vert}.

Theorem 1 For each {S \in \mathcal{C}}, the probability that {S} is consistently labeled is lower bounded by {1 - \left( 1 - \frac{z_S}{k \vert S \vert} \right)^k}.

Proof: Since we are trying to lower bound the probability of getting consistently labeled, we can consider just the probability of all the elements in {S} to be consistently labeled in the same iteration – let’s estimate this probability. This is:

\displaystyle q = \sum_{i \in L} \frac{1}{\vert L \vert} Pr[S \text{is all labeled with } i \text{ in iteration } j]

If {i \in L} is chosen in iteration {j}, all elements are labeled with {i} if {t < x_{ia}} for all {a \in S}, so the probability is {\min_{a \in S} x_{ia} \geq y_{iS}}. So, we have:

\displaystyle q = \sum_{i \in L}\sum_{i \in L} \frac{1}{\vert L \vert} y_{iS} = \frac{z_S}{\vert L \vert}

Now, let {p} be the probability that set {S} is hit by the labeling in phase {j}. If label {i \in L} is chosen, the set {S} is hit by the labeling if {t < \max_{a \in S} x_{ia}}, therefore:

\displaystyle p = \sum_{i \in L} \frac{1}{\vert L \vert} \max_{a \in S} x_{ia} \leq \sum_{i \in L} \frac{1}{\vert L \vert} \sum_{a \in S} x_{ia}

inverting the order of the summation, we get:

\displaystyle p \leq \frac{1}{\vert L \vert} \sum_{a \in S} \sum_{i \in L} x_{ia} \leq \frac{1}{\vert L \vert} \sum_{a \in S} k = \frac{k \vert S \vert}{\vert L \vert}

The probability that {S} is consistently labeled is smaller greater than the probability that it is consistently labeled in the same iteration before the set is hit {k} times. In one iteration three things may happen: either the set is not hit, or it is hit but it is not consistently labeled or it is consistently labeled. The figure below measures how many times the set is hit. The down arrows represent the event that the set {S} is consistently labeled:

consist_lab

A standard trick in this cases is to disregard the self-loops and normalize the probabilities. This way the probability that {S} is consistently labeled is:

\displaystyle \frac{q}{p} \sum_{j=0}^{k-1} \left( \frac{p-q}{p} \right)^j = \frac{q}{p} \cdot \frac{ 1 - \left( \frac{p-q}{p} \right)^k }{1 - \frac{p-q}{p} } = 1 - \left( 1 - \frac{q}{p} \right)^k

Now, we just use that {p \leq \frac{k \vert S \vert}{\vert L \vert}} and {q = \frac{z_S}{\vert L \vert}} to obtain the desired result. \Box

The approximation factor of {2 f_{max}} follows straight from the previous theorem by considering the following inequalities:

\displaystyle \left( 1 - \frac{a}{k}\right)^k \leq e^{-a} \leq 1 - \frac{a}{2}

and noting that the LP solution is a lower bound of the optimal value

Theorem 2 The rounding procedure gives a randomized algorithm that, in expectation, achieves a {2 f_{max}} approximation to the optimal consistent labeling.

More on eigenvalues and social networks

July 29th, 2009 No comments

Following the suggestions of some of my friends, I decided to look at the other eigenvalues. The intuition is that if an eigenvalue is close to 1, it must also represent a good partition of the graph. Therefore I ploted the same graph as in my last post but associating each node v with the coordinates (x_3(v), x_n(v)) where x_i is the eigenvector corresponding to the i-th eigenvalue. The first eigenvalues are actually very close to 1. I have:\lambda_1 = 1.0 (as expected), \lambda_2 = 0.9925, \lambda_3 = 0.9318, \lambda_4 = 0.8596, \lambda_5 = 0.8493, … For my suprise, plotting (x_3(v), x_n(v)) gave a nice and natural separation between the nodes representing people from my family (in yellow) and the other cathegories:

orkut_l3_lnAlso, I looked at this version of the graph with people names, and I noticed that people with extreme values of x_2(v) are very good representatives of their respective clusters. In that first graph (in the other blog post), we got a good separation of my high-school friends (in green) and my undergrad friends (in blue). The nodes in the extreme right are my friends I used to hang out more with in college – they are people that are definitely in that cluster. The same about the friends in the extreme left, they are definitely in the other cluster.

Now, let’s turn our attention to the picture above. Not only the yellow nodes were clusterized, the blue nodes appear to be all in the same place, as if the main cluster identified in the last eigenvector all collapses to one point. If we look further at the fourth eigenvector, this phenomenon still happens:

orkut_l4_lnSee how all the nodes that were somehow “important” in the last two graphs all collapse to the same place and now we are trying to sort the other nodes. How far can we go? Is there some clean recursive way of extracting multiple (possible overlapping) communities in those large graphs just looking in the local neighborhood? I haven’t reviewed much of the literature yet.

Eigenvalue density

I am also curious about how is the distribution of the eigenvalues of that graph. For example, the following graph shows the “eigenvalue density”. We know that \lambda_i \in [-1,1], but how are they distributed in this interval. The largest eigenvalue is 1 and the smallest is -0.7233. The next plot shows the distribution:

eigenvalue_densityMost of the eigenvalues are accumulated on the positive side (this graph really seems far from being bipartite). Another interesting question if: how can we interpret the shape of this curve. Which characteristics from the graph can we get? I still don’t know how to answer this questions, but I think some spectral graph theory might help. I really need to start reading about that.

Next, I plotted the points (i, \lambda_i) in a graph and got the following pattern:

spectrumThis seems like a nice symmetric pattern and there should be a reason for that. I don’t know in what extent this is a property of real world graphs or from graphs in general. There is this paper by Mihail and Papadimitriou analyzing the structure of eigenvalues in real world graphs. They propose a model for explaining why the highest eigenvalues seem to be a power-law distribution, i.e. \lambda_i = \frac{\lambda_1}{\alpha^i} for the first k eigenvalues. They claim the eigenvalues are close to \sqrt{d_i} where d_i is the i-th highest degree. Therefore, since d_i follows a power law, $\lambda_i$ should also follow.

Although the whole social netwotk follows a power-law (more about power-laws in this nice survey by Mitzenmacher), we are dealing with a local neighborhood and even the degrees in the graph don’t follow a power law, as we can see in the following log-log plot of (i, \sqrt{d_i}) and (i, \lambda_i) :

spectrum_logEven though they are not a power-law, they still seem to have a similar shape, which I can’t exacly explain right now. Papadimitrou suggests that the fact that \lambda_i \approx \sqrt{d_i} is a negative result, since \lambda_i seems to be reflecting only a local property of the graph. Here it seems to be following a similar pattern, however, we saw that we can get some nice global and structural properties from that, so it doesn’t seem to be reflecting only local properties.