Archive

Posts Tagged ‘learning’

Predicting the Present

July 13th, 2009 No comments

I am coming back from EC’09, where I presented my first paper from grad school: Sponsored Search Equilibria for Convervative Bidders in the AdAuctions Workshop. It was very nice being in a mainconference and talking to a lot of people about my work and about general topics in game theory. I’ll try to blog these days about some nice talks I attended and about things I learned there. Let’s start by the keynote talk in the AdAuctions Workshop by Hal Varian.

Varian in a chief-economist at Google and in this talk he presented his work about Predicting the Present [more in Google Research blog]. According to Yogi Berra, making predictions is hard, specially about the future. That’s why we turn our attention to the present. His goal in this research is to use the data publicly available in Google Trends to make predictions about economic indicators. A more useful interface is Google Insight from Search where you can see the search statistics divided by keyword, category, location, time, … and also download a cvs file with the data.

real_state_queries

For example, consider the screenshot above about the growth of searches for keyword related to Real State. First, notice that there is seasonality on the searches. People tend to be much less interested in buying houses in the end of the year and there is an anual peak in summer months. Besides this trend that repeats itself every year there is a global trend of diminishing the searches related to real state, possibly because of the economic crisis due to mortgages, loans, … The questions the authors try to address is: what can we say about the economy in general just by looking at the search profiles at Google?

Real state was the first example given in the talk, but there are several other interesting questions. The authors compare for example the graph of sales of Chevrolet and Toyota to the searches for those automobile brands and they can get a pretty good fit. There are isolated points where the value predicted using Google Trends differs considerably from the sales: an interesting point here is that these are exactly some bid discounts (as “employee price for everyone”) given by those companies. This is an efficient way of measuring how effective those big market campaigns are.

After this talk, I got amazed by the amount of data available in the web for making predictions. Google Trends data can be downloaded in CVS format, what makes it easy to start making experiments with it. Other talks drawn my attention to other sources of data in the web. Prediction Markets, like Intrade, for example, make the price of the securities avaliable for downloading. This gives us the evolution of the beliefs of the traders about a specific topic.

I am very interested in data where we can see the evolution over time and analyze which kinds of social phenomena operate on it. Other cool data set to play with the “time axis” is the Wikipedia dataset. In Wikimedia Downloads, it is possible to download an entire XML dump of wikipedia. You don’t even need to crawl. All the text, links, … it is all there. There is an option of downloading the entire edit history of each article. This way, we know the exact Wikipedia state in each time point. Other sources of data around the web that I found interesting and would very much like to do something with it are  Google Fusion Tables and data.gov.

More interesting stuff about EC soon…

Categories: theory Tags: , ,