The basic idea behind the Bayesian approach to probability and statistics is fairly straightforward. In fact it mimics the way we operate in daily life quite accurately. Bayes provides a means of updating the probability some event will occur as we acquire new information. This sounds like a fairly uncontentious way of going about business, and yet for over two hundred years Bayes was rejected by the mainstream community of statisticians – although the military has always used it secretly, and left the egg-heads to debate the philosophical issues (Bayes was used in the Second World War to help crack the Enigma Code). The basic philosophy behind Bayes is well expressed in the following quote:
When the facts change, I change my opinion. What do you do, sir?
John Maynard Keynes
The Bayes Rule for computing probabilities was formulated by Rev Thomas Bayes an eighteenth century clergyman. It was published posthumously and didn’t really receive any attention until the great mathematician Lapalce reformulated and extended the idea. Even then statisticians remained in denial for over two hundred years until the overwhelming success of some high profile applications meant they could not hide their heads in the sand any longer – but believe it or not there is still a statisticians equivalent of the flat earth society.
Classical statistics is far more rigid. It assumes a population of objects we are interested in and that some unknown parameters describe this population perfectly. This is flawed from the start – but let’s not get distracted by philosophical issues. We then sample the population and derive statistics which, through a whole lot of jigery-pokery, are assumed to approximate to the parameters of the whole population. Bayes is far more direct. Nothing is fixed and probabilities vary as new information becomes available. Imagine it is summer and rain is a rarity. On watching the weather forecast a high probability of rain for the following day is made. Do we choose to ignore this information, or do we make sure we go out with an umbrella the following day? The latter would seem to make sense – a perfect example of Bayesian thinking.
If we are going to get any further than anecdotes and analogies we need a smattering of math unfortunately. It isn’t onerous, and as long as we remember the meaning behind the math the end results should make sense. Although I should warn that the results from Bayesian analysis are often counter-intuitive.
Conditional Probabilities
A central notion used in Bayes is that of conditional probability. Please do not fast forward at this point, conditional probabilities are fairly easy to understand. Going back to the rain example we might say that the probability of rain on a given day in summer is just two percent – or a probability of 0.02. So given it is summer we assume that the probability of rain tomorrow is fairly remote. This could be expressed as P(Rain|Summer) – read as the probability of rain given it is summer – the vertical bar can be read as ‘given that’. We know the answer to this it is P(Rain|Summer) = 0.02. This is the conditional probability of rain given that it is summer.
Conditional probabilities can be expressed in terms of unconditional probabilities via a simple formula. To get to this formula we’ll use a Venn diagram.
The blue rectangle represents all the days in the year – 365 of them. The yellow circle represents all the summer days – say 90 of them. The large brown circle represents all the days it rains – 150 say. The bit we are really interested in is the small orange area, the days when it is summer and it rains. The probability of rain on any particular day in the year, is the number of days it rains divided by the number of days in the year. We’ve already said it rains 150 days a year, and so the probability of rain on any given day is 150/365, and this equals 0.411, or forty one percent. This takes no account for season, but is just the chance of rain given any random day from the year. However we know that it is summer. The minute we say ‘given that it is summer’ we lose interest in all the other days – the yellow circle becomes our universe.
To calculate P(Rain|Summer) we need to know the number of days it rains when it is summer, and divide by the total number of days in summer. In the Venn diagram it means dividing the number of days in the small orange area by the number of days in the yellow area.
Now for some more notation (and mathematics is nothing more than notation). When two areas in Venn diagram overlap they form an intersection – because they intersect. Mathematicians use an inverted ‘U’ to symbolize this, but we’ll use the ‘&’ sign. In other words ‘rain and summer’ is represented by ‘rain&summer’ – the intersection on the diagram above. We’ve already stated that the probability of rain given that it is summer can be calculated by dividing the number of days in the summer when it rains by the number of days in summer. This can be expressed mathematically as:
P(Rain|Summer) = P(Rain&Summer)/P(Summer) – (i)
Now there is no reason why we cannot reverse the reasoning here and calculate the probability of summer given that it is raining – or P(Summer|Rain). Swapping rain and summer around we get:
P(Summer|Rain) = P(Summer&Rain)/P(Rain) – (ii)
Bear with it – we are almost there.
Now Summer&Rain is just the same as Rain&Summer. Since both the above equations contain P(Summer&Rain) we can isolate this term in both equations by multiplying through by P(Summer) in (i) and P(Rain) in (ii). Having done this we get:
P(Rain|Summer) x P(Summer) = P(Summer|Rain) x P(Rain)
By rearranging we get a simplified form of Bayes rule:
P(Summer|Rain) = (P(Rain|Summer) x P(Summer))/P(Rain)
So what you might ask. Well what we have just done is permit the calculation of one conditional probability given that we know its opposite. In this trivial example we can ask what is the probability that it is summer given that it is raining. Well we know all the probabilities on the right hand side. P(Rain|Summer) = 0.02 as stated earlier. P(Summer) = 90/365 = 0.247. P(Rain) = 150/365 = 0.411
So P(Summer|Rain) = (0.02 x 0.247) / 0.411 = 0.012
The probability it is summer given a rainy day is just over one per cent, whereas the probability of a rainy day given summer is two percent. It is often difficult to see what this really means, but if you look at the Venn diagram it can be seen that rainy days are a much larger proportion of all days than summer days, and so the proportion of summer days given it is raining is smaller.
I’m going to end this piece by introducing two terms – prior and posterior probabilities. As the name suggests a prior probability is one prior to any conditions, and so P(Summer) is a prior because it is totally unconditioned. Whereas P(Summer|Rain) is posterior because it is post a condition – namely that it is raining. This is what Bayes is really all about and why at the beginning I talked about updating probabilities based on evidence. So our example can be formulated as:
P(Summer|Rain) = P(Summer) x P(Rain|Summer)/P(Rain)
In other words how does the knowledge that it is raining modify our belief that this might be a summer day? Well P(Summer), as we have shown before is 0.247 or around 25%. As soon as we are given the evidence that it is raining the probability that it is summer drops to just 0.01 or around 1%.
Now this is a trivial example, and of course most sane people would know the season of the year. Bayes is typically applied when we cannot observe the outcome and the best we can do is use evidence to establish the probability of a particular event happening. It is used in gambling, stock market forecasting, identifying spam, and very heavily by the military. It was even used to predict the likelihood of a space shuttle accident, before the disaster struck Columbia. Bayes gave an estimate of about one in thirty whereas traditional statistics put the probability at one in a thousand.
In the next article I’ll look at some applications in business – be afraid be very afraid.