Looking at probability distributions
I’ve been taking two classes in probability this semester and in those I saw the proofs of a lot of interesting theorems which I knew about previously but I have never seen the proof, as the Central Limit Theorem, the Laws of Large Numbers and so on… Also, some theory which is looks somewhat ugly in the undergrad courses becomes very clear with the proper formal treatment. Today I was thinking what was the main take-home message that a computer scientist could take from those classes and. at ;east for me, this message is the various ways of looking to probability distributions. I’ve heard about moments, Laplace transform, Fourier transform and other tools like that, but I never realized before their true power. Probably still today, most of their true power is hidden from me, but I am starting to look at them in a different way. Let me try to go over a few examples of different ways we can look at probability distributions and show cases where they are interesting.
Most of ways of looking at probability distributions are associated with multiplicative system: a multiplicative system is a set of real-valued functions with the property that if then . Those kinds of sets are powerful because of the Multiplicative Systems Theorem:
Theorem 1 (Multiplicative Systems Theorem) If is a multiplicative system, is a linear space containing (the constant function ) and is closed under bounded convergence, then implies that contains all bounded -measurable functions.
The theorem might look a bit cryptic if you are not familiar with the definitions, but it boils down to the following translation:
Theorem 2 (Translation of the Multiplicative Systems Theorem) If is “general” multiplicative system, and are random variable such that for all then and have the same distribution.
where general excludes some troublesome cases like or all constant functions, for example. In technical terms, we wanted to be the Borel -algebra. But let’s not worry about those technical details and just look at the translated version. We now, discuss several kinds of multiplicative systems:
- The most common description of the a random variable is by the cummulative distribution function . This is associated with notice that simply .
- We can characterize a random variable by its moments: the variable is characterized by the set . Given the moemnts , the variable is totally characterized, i.e., if two variables have the same moments, then they have the same distribution by the Multiplicative Systems Theorem. This description is associated with the system
- Moment Generating Function: If is a variable that assumes only integer values, we can describe the it as , where . An interesting way of representing those probabilities is as the moment generating function . This is associated with the multiplicative system .Now suppose we are given two discrete independent variables and . What do we know about . It is easy to know its expectation, its variance, … but what about more complicated things? What is the distribution of ? Moment generating functions answer this question very easily, since:
If we know moment generating functions, we can calculate expectation very easily, since . For example, suppose we have a process like that: there is one bacteria in time . In each timestep, either this bacteria dies (with probability ), continues alive without reproducing (with probability or has offsprings (with probability ). In that case . Each time, the same happens, independently with each of the bacteria alive in that moment. The question is, what is the expected number of bacteria in time ?
It looks like a complicated problem with just elementary tools, but it is a simple problem if we have moment generating functions. Just let be the variable associated with the bacteria of time . It is zero if it dies, if it stays the same and if it has offsprings. Let also be the number of bacteria in time . We want to know . First, see that:
Now, let’s write that in terms of moment generating functions:
which is just:
since the variables are all independent and identically distributed. Now, notice that:
by the definition of moment generating function, so we effectively proved that:
We proved that is just iterated times. Now, calculating the expectation is easy, using the fact that and . Just see that: . Then, clearly . Using similar technique we can prove a lot more things about this process, just by analyzing the behavior of the moment generating function.
- Laplace Tranform: Now, moving to continuous variables, if is a continuous non-negative variable we can define its Laplace tranform as: , where stands for the distribution of , for example, . This is associated with the multiplicative system . Again, by the Multiplicative Systems Theorem, if , then the two variables have the same distribution. The Laplace tranform has the same nice properties as the Moment Generating Function, for example, .And it allows us to do similar tricks than the one I just showed for Moment Generating Functions. One common trick that is used, for example, in the proof of Chernoff bounds is, given independent non-negative random variables:
where we also used Markov Inequality: . Passing to the Laplace transform is the main ingredient in the Chernoff bound and it allows us to sort of “decouple” the random variables in the sum. There are several other cases where the Laplace transform proves itsself very useful and turns things that looked very complicated when we saw in undergrad courses into simple and clear things. One clear example of that is the motivation for the Poisson random variable:
If are independend exponentially distributed random variables with mean , then . An elementary calculation shows that its laplace transform is . Let , i.e., the time of the arrival. We want to know what is the distribution of . How to do that?
Now, we need to find such that . Now it is just a matter of solving this equation and we get: . Now, the Poisson varible measures the number of arrivals in and therefore:
- Characteristic Function or Fourier Tranform: Taking we get the Fourier Transform: which also has some of the nice properties of the previous ones and some additional ones. The characteristic functions were the main actors in the development of all the probability techniques that lead to the main result of 19th century Probability Theory: the Central Limit Theorem. We know that moment generating functions and Laplace transforms completely characterize the distributions, but it is not clear how to recover a distribution once we have a transform. For Fourier Transform there is a cleas and simple way of doing that by means of the Inversion Formula:
One fact that always puzzled me was: why is the normal distribution so important? What does it have in special to be the limiting distribution in the Central Limit Theorem, i.e., if is a sequence of independent random variables, then under some natural conditions on the variables. The reason the normal is so special is because it is a “fixed point” for the Fourier Transform. We can see that . And there we have something special about it that makes me believe the Central Limit Theorem.
————————-
This blog post was based on lectures by Professor Dynkin at Cornell.