Submodularity
Last week a heard at least twice about submodularity what made me want to write about. First, it was in the talk by Carlos Guestrin in the first CompSust conference. His very interesting point was: Machine Learning explored Convexity in the core of many machine learning algorithm (SVMs, for example). We should look for other general models, other than convexity, that have interesting properties and so that we could design algorithms based on that. He proposes submodularity as a natural candidate.
The idea of submodular functions generalizes in a certain way the concept of concave functions. The main property captured is the disminishing returns property. Consider a set and the set of its subsets. We say that a function is submodular if:
that is, adding a new element to a larger set has smaller marginal gain than adding it to a smaller set. This is very natural in a lot of circumstances: for example, let be a set of books (or papers) anf be the knowledge you get by reading each of them. Since the topics overlap, reading an extra book when you already read has a smaller marginal gain (or at least no greater) than reading it when you read only . One can think of it as a natural generalization of concave functions, which also have this disminishing returns propery. Actually if is a concave function, is submodular.
In the Cascading Behavior in Networks, Jon Kleinberg described a very interesting submodular function: considere a graph and a epidemia spreading through the nodes. Suppose an initial set of infected nodes. As soon as one node becomes infected, each node , s.t. becomes infected with probability . Let be the expected number of infected nodes in the end. It can be proved that is submodular. One can think of this infection as being some information spreading in a network (say, the blogosphere). Then, if we start spreading this information through nodes, which nodes should we pick to make sure most nodes will get the information in the end?
Another example is vertex cover: we can think of as a set of nodes in a graph and the number of edges covered by those nodes. It is nor hard to see that this function is submodular. Set cover and other covering problems can be modelled as submodular functions. Normally, the main question is something like: how can I maximize given that I can pick at most elements for , or, stated in mathematical terms:
The fact that is submodular makes it easy to approximate it using a greedy algorithm, using the following very interesting result:
Theorem 1 (Nemhauser et al, 78) Given a monotonic (i.e for ) submodular function with , the set obtained by taking and
is a approximation to .
Proof: First, notice that:
where the last part follows by submodularity. Let be the optimal solution for the problem, so we have that:
because of our choice of . So, if then we know that:
and therefore:
Solving that recursion, it is easy to see that:
what gives the bound.
The second time I heard about submodularity last week was in a nice talk in the Theory Discussion Group about the paper Approximating submodular functions everywhere. Basically there is a monotone submodular function . The values of are calculated through an oracle. The paper shows how to construct a function that approximated everywhere just by calling the oracle a number of times. The approximation is such that:
The paper is very deep, so I will not comment about it, but about some nice things I learned in the introduction of the talk about submodular functions and polymatroids. The polymatroid is the following polyhedron associated with a submodular function:
where .
One nice thing about those objects is that we can easily optimize over them using a greedy algorithm:
Theorem 2 Considere a weight function and a monotone submodular function . Then the problem:
can be solved by the following greedy algorithm: order the elements in the set according to the -value, i.e, with and define
Proof: We need to prove that is a feasible point of and then that is optimal. To prove feasibility, we need to prove that for all we have . We prove for all sets for all and we do that by induction in . For , it is trivial. Then, supposing we proved for , let’s prove for . Take . If , we are done. If not:
Now, we need to prove optimality and this is essentially a primal-dual argument. The dual of the linear program:
if the program:
Consider the following dual solution: (take ) and for all other sets. It is clearly feasible and matches the objective value of the primal solution. Both are therefore optimal.
Notice that the proof has a “matroid flavour” and there is where the name poly-matroid comes from. Note that the rank function of a matroid is a submodular function and that the polymatroid of the rank function is something called the independent set polytope of the matroid. More about it in a later blog post.