Archive

Author Archive

Prisioners and boxes

July 13th, 2009 No comments

In EC, Sean, one of my friends from UBC, told me an interesting puzzle. I liked both the problem and the solution a lot and since the solution has a lot of interesting ideas, I felt like writing about it. EC also reminded me that puzzles are fun and that I should use a bit more of my time solving those nice math problems. The problem is like that:

There were 100 prisoners and 3 different rooms, say A, B and C. In the beginning they are all in room A. In room B there are 100 boxes, each one containing the name of a different prisoner. One at a time, the prisoners are brought from A to B and in B they can open 50 boxes. Then they are brought to room C. (They cannot change the state of room B, so it is the same of having 100 identical copies of room B and bringing each prisoner to one of the rooms. They cannot leave signals in the room). If each prisoner finds his owns name, they are all freed. They only think they can do is to agree on a certain strategy while they are all in room A and then follow that strategy. If they don’t succeed, the names are randomly rearranged and they can try again the next day. Find a strategy for them so that they succeed in less than one week (in expectation).

A few things I need to drawn your attention to: (i) it is purely a math puzzle, there are no word-games. If something is obscure, it was my fault and it was not intentional, (ii) the fact that names were rearranged in boxes randomly is very important. The solution doesn’t work if the distribution is adversarial. (iii) if each of them pick 50 boxes at random, than each prisioner has 1/2 probability of succeeding, so they take 2^{100} days in expectation, what is a lot. How to reduce it to about seven days?

Structure of random permutations

The solution has to do with the structure of a permutation drawn at random. Consider one of the n! permutation is sampled uniform at random. We can write each permutation as a product of disjoint cycles. For example, the permutation \sigma = \begin{pmatrix} 1&2&3&4&5&6\\4&5&6&3&2&1 \end{pmatrix} can be written as a product of two disjoint cycles: \sigma = ( 1,4,3,6 ) (2,5). In the language of boxes and prisoners, it is: box a_1 contains the name of prisoner a_2, then box a_2 contains the name of prisoner a_3 and so on, until box a_k contains name of prisoner a_1. This is the cycle (a_1, a_2, \hdots, a_k). If  all the cycles have length smaller than n/2 where n is the number of prisoners, then there is a simple strategy: we can consider that each prisoner corresponds to one box a priori (say prisoners have number 1, \hdots, n and boxes also have numbers 1, \hdots, n. The permutation  is define by the function \sigma from the box number to the number of the prisoner whose name is inside the box. Now, prisoner k opens box k, reads the name of the prisoner inside the box and the box with that number, then he looks at the number of the prisoner inside that other box and continues following the cycle.

The probability of success is the same of the probability that a permutation \pi drawn at random has no cycle of size greater than n/2. Let $\phi_p$ be the probability that a random permutation has a cycle of length p, where p > n/2. This is a simple exercise of combinatorics: we can have \begin{pmatrix} n \\ p \end{pmatrix} ways of picking the p elements that will form the p-cycle. There are (n-p)! permutations for the remaining n-p elements and (p-1)! cycles that can be formed with those p elements. So, we have:

\displaystyle \phi_p = \frac{1}{n!}  \begin{pmatrix} n \\ p \end{pmatrix} (n-p)! (p-1)! = \frac{1}{p}

And therefore the probability of having a cycle with length more than n/2 is \sum_{p=1+n/2}^n \frac{1}{p}. For n = 100, we get about 0.68, therefore the expected time before one permutation with no cycle larger than n/2 appears is \frac{1}{1-0.68} \approx 3.3 days. Much less than one week, actually!

Structure and Randomness

The solution to this puzzle explores the structure in randomness. A lot of work in computer science is based on looking at the structure (or expected structure) of a randomly chosen object. A very elegant kind of proof consists in creating a random process that generates object and prove that some object exists because is happens with positive probability. One simple, yet very beautiful example, is the following theorem: given any logic expression in 3-CNF form, i.e., one expression of the kind:

\displaystyle \bigwedge_i (a_{i1} \vee a_{i2} \vee a_{i3})

where a_{ij} \in \{ x_i, \tilde x_i\}, there is an assignment of x_i to \{0,1\} that satisfies 7/8 of the clauses. Even though the statement of this result has no probability in it, the proof is probabilitic. Consider a random assignment to the variables. Then each clause is satisfied with probability 7/8. Because of linearity of expectation, the expected number of satisfied clauses is 7/8 of the total. Since this is the expected number of satisfied clauses, there must be at least one assignments satisfying more than 7/8.

This kind of proof is called the probabilistic method. There are also a lot of other cool things exploring randomness in computer science, in design of algorithms, criptography, complexity theory, … Recently I read a nice blog post by Lipton, he discusses ways he would try to solve the P vs NP problem. One comment I found really interesting and exciting (but that I don’t quite understand yet) is that we could, maybe, separate the SAT-3 instances in “random” and “structured” instances and try to use different methods exploring randomness or structure in each of them.

\left(\begin{array}{ccc}1&2&3\\3&2&1\end{array}\ri  ght)

Categories: puzzles Tags: , ,

Predicting the Present

July 13th, 2009 No comments

I am coming back from EC’09, where I presented my first paper from grad school: Sponsored Search Equilibria for Convervative Bidders in the AdAuctions Workshop. It was very nice being in a mainconference and talking to a lot of people about my work and about general topics in game theory. I’ll try to blog these days about some nice talks I attended and about things I learned there. Let’s start by the keynote talk in the AdAuctions Workshop by Hal Varian.

Varian in a chief-economist at Google and in this talk he presented his work about Predicting the Present [more in Google Research blog]. According to Yogi Berra, making predictions is hard, specially about the future. That’s why we turn our attention to the present. His goal in this research is to use the data publicly available in Google Trends to make predictions about economic indicators. A more useful interface is Google Insight from Search where you can see the search statistics divided by keyword, category, location, time, … and also download a cvs file with the data.

real_state_queries

For example, consider the screenshot above about the growth of searches for keyword related to Real State. First, notice that there is seasonality on the searches. People tend to be much less interested in buying houses in the end of the year and there is an anual peak in summer months. Besides this trend that repeats itself every year there is a global trend of diminishing the searches related to real state, possibly because of the economic crisis due to mortgages, loans, … The questions the authors try to address is: what can we say about the economy in general just by looking at the search profiles at Google?

Real state was the first example given in the talk, but there are several other interesting questions. The authors compare for example the graph of sales of Chevrolet and Toyota to the searches for those automobile brands and they can get a pretty good fit. There are isolated points where the value predicted using Google Trends differs considerably from the sales: an interesting point here is that these are exactly some bid discounts (as “employee price for everyone”) given by those companies. This is an efficient way of measuring how effective those big market campaigns are.

After this talk, I got amazed by the amount of data available in the web for making predictions. Google Trends data can be downloaded in CVS format, what makes it easy to start making experiments with it. Other talks drawn my attention to other sources of data in the web. Prediction Markets, like Intrade, for example, make the price of the securities avaliable for downloading. This gives us the evolution of the beliefs of the traders about a specific topic.

I am very interested in data where we can see the evolution over time and analyze which kinds of social phenomena operate on it. Other cool data set to play with the “time axis” is the Wikipedia dataset. In Wikimedia Downloads, it is possible to download an entire XML dump of wikipedia. You don’t even need to crawl. All the text, links, … it is all there. There is an option of downloading the entire edit history of each article. This way, we know the exact Wikipedia state in each time point. Other sources of data around the web that I found interesting and would very much like to do something with it are  Google Fusion Tables and data.gov.

More interesting stuff about EC soon…

Categories: theory Tags: , ,

Submodularity

June 22nd, 2009 No comments

Last week a heard at least twice about submodularity what made me want to write about. First, it was in the talk by Carlos Guestrin in the first CompSust conference. His very interesting point was: Machine Learning explored Convexity in the core of many machine learning algorithm (SVMs, for example). We should look for other general models, other than convexity, that have interesting properties and so that we could design algorithms based on that. He proposes submodularity as a natural candidate.

The idea of submodular functions generalizes in a certain way the concept of concave functions. The main property captured is the disminishing returns property. Consider a set {S} and the set {2^S} of its subsets. We say that a function {f: 2^S \rightarrow \mathbb R} is submodular if:

\displaystyle f(S \cup \{z\}) - f(S) \geq f(T \cup \{z\}) - f(T) \text{ for } S \subseteq T

that is, adding a new element {z} to a larger set has smaller marginal gain than adding it to a smaller set. This is very natural in a lot of circumstances: for example, let {S} be a set of books (or papers) anf {f(S)} be the knowledge you get by reading each of them. Since the topics overlap, reading an extra book when you already read {100} has a smaller marginal gain (or at least no greater) than reading it when you read only {10}. One can think of it as a natural generalization of concave functions, which also have this disminishing returns propery. Actually if {\phi} is a concave function, {f(S) = \phi(\vert S \vert)} is submodular.

In the Cascading Behavior in Networks, Jon Kleinberg described a very interesting submodular function: considere a graph {G = (V,E)} and a epidemia spreading through the nodes. Suppose an initial set {S \subseteq V} of infected nodes. As soon as one node {u} becomes infected, each node {v}, s.t. {(u,v) \in E} becomes infected with probability {p}. Let {f(S)} be the expected number of infected nodes in the end. It can be proved that {f} is submodular. One can think of this infection as being some information spreading in a network (say, the blogosphere). Then, if we start spreading this information through {k} nodes, which nodes should we pick to make sure most nodes will get the information in the end?

Another example is vertex cover: we can think of {S \subseteq V} as a set of nodes in a graph and {f(S)} the number of edges covered by those nodes. It is nor hard to see that this function is submodular. Set cover and other covering problems can be modelled as submodular functions. Normally, the main question is something like: how can I maximize {f} given that I can pick at most {k} elements for {S}, or, stated in mathematical terms:

\displaystyle \max_{\vert S \vert \leq k} f(S)

The fact that {f} is submodular makes it easy to approximate it using a greedy algorithm, using the following very interesting result:

Theorem 1 (Nemhauser et al, 78) Given a monotonic (i.e {f(S) \leq f(T)} for {S \subseteq T}) submodular function {f : 2^X \rightarrow {\mathbb R}_+} with {f(\emptyset) = 0}, the set {S = S_k} obtained by taking {S_0 = \emptyset} and

\displaystyle S_{i+1} = S_i \cup \{z_i\} \text{ where } z_i \in \text{argmin}_{z \notin S_i} f(S_i + z)

is a {\left( 1 - \frac{1}{e} \right)} approximation to {\max_{\vert S \vert \leq k} f(S)}.

Proof: First, notice that:

\displaystyle \begin{aligned}f(A \cup B) & = \sum_{j=1}^n f(A \cup \{b_1, \hdots, b_j \}) - f(A \cup \{b_1, \hdots, b_{j-1} \}) \\ & \leq \sum_{j=1}^n f(A \cup \{b_j \}) - f(A) \end{aligned}

where the last part follows by submodularity. Let {T} be the optimal solution for the problem, so we have that:

\displaystyle f(T) \leq f(S_i \cup T) \leq f(S_i) + \sum_{t \in T} f(S_i \cup \{ t \}) - f(S_i) \leq f(S_i) + \sum_{t \in T} f(S_{i+1}) - f(S_i)

because of our choice of {z_i}. So, if {\delta_i = f(S_i) - f(S_{i-1})} then we know that:

\displaystyle f(T) \leq f(S_i) + k \delta_{i+1} \Rightarrow \delta_{i+1} \geq \frac{f(T) - f(S_i)}{k}

and therefore:

\displaystyle f(S_{i+1}) = f(S_i) + \delta_{i+1} \geq \frac{1}{k} f(T) + \left( 1 - \frac{1}{k} \right) f(S_i)

Solving that recursion, it is easy to see that:

\displaystyle f(S_i) \geq \left[ 1 - \left( 1 - \frac{1}{k} \right)^i \right] f(T)

what gives the {1-1/e} bound.\Box

The second time I heard about submodularity last week was in a nice talk in the Theory Discussion Group about the paper Approximating submodular functions everywhere. Basically there is a monotone submodular function {f:2^{[n]} \rightarrow \mathbb R}. The values of {f(S)} are calculated through an oracle. The paper shows how to construct a function {\hat{f}(S)} that approximated {f(S)} everywhere just by calling the oracle a {poly(n)} number of times. The approximation is such that:

\displaystyle \hat{f}(S) \leq f(S) \leq \tilde{O}(\sqrt{n}) \hat{f}(S)

The paper is very deep, so I will not comment about it, but about some nice things I learned in the introduction of the talk about submodular functions and polymatroids. The polymatroid is the following polyhedron associated with a submodular function:

\displaystyle P_f = \left\{ x \in {\mathbb R}_+^n; x(S) \leq f(S), \forall S \subseteq [n] \right\}

where {x(S) = \sum_{i \in S} x_i}.

One nice thing about those objects is that we can easily optimize over them using a greedy algorithm:

Theorem 2 Considere a weight function {w:[n] \rightarrow {\mathbb R}_+} and a monotone submodular function {f:2^{[n]} \rightarrow {\mathbb R}_+}. Then the problem:

\displaystyle \min_{x \in P_f} \langle w, x\rangle

can be solved by the following greedy algorithm: order the elements in the set according to the {w}-value, i.e, {[n] = \{ s_1, \hdots, s_n\}} with {w(s_1) \geq w(s_2) \geq \hdots \geq w(s_n) } and define

\displaystyle x(s_i) = f(\{s_1, \hdots, s_i\}) - f(\{s_1, \hdots, s_{i-1}\})

Proof: We need to prove that {x} is a feasible point of {P_f} and then that {x} is optimal. To prove feasibility, we need to prove that for all {U \subseteq [n]} we have {x(U) \leq f(U)}. We prove for all sets {U \subseteq \{ s_1, \hdots, s_i \}} for all {i} and we do that by induction in {i}. For {i = 0}, it is trivial. Then, supposing we proved for {i}, let’s prove for {i+1}. Take {U \subseteq \{ s_1, \hdots, s_{i+1} \}}. If {s_{i+1} \notin U}, we are done. If not:

\displaystyle \begin{aligned} x(U) & = x(U \setminus \{ s_{i+1} \}) + x(s_{i+1}) \\ & \leq f(U \setminus \{ s_{i+1} \}) + f(\{s_1, \hdots, s_{i+1}\}) - f(\{s_1, \hdots, s_{i}\}) \leq f(U) \end{aligned}

Now, we need to prove optimality and this is essentially a primal-dual argument. The dual of the linear program:

\displaystyle \min \langle w, x\rangle \text{ s.t. } x(U) \leq f(U); x(U) \geq 0; \forall U \subseteq [n]

if the program:

\displaystyle \max \sum_{U \subseteq [n]} f(U) y(U) \text{ s.t. } \sum_{U; i \in U} y(U) \geq w_i; \forall i \in [n]; y(U) \geq 0

Consider the following dual solution: {y(\{s_1, \hdots, s_i\}) = w_i - w_{i+1}} (take {w_{n+1} = 0}) and {y(U) = 0} for all other sets. It is clearly feasible and matches the objective value of the primal solution. Both are therefore optimal. \Box

Notice that the proof has a “matroid flavour” and there is where the name poly-matroid comes from. Note that the rank function of a matroid is a submodular function and that the polymatroid of the rank function is something called the independent set polytope of the matroid. More about it in a later blog post.