If you are unfamiliar with Bayes please read part 1 of this series.
In the article previous to this one we established a simplified form of Bayes rule for the very specific example involving the weather and the season of the year. The equation looked like this:
P(Summer|Rain) = P(Summer) x P(Rain|Summer)/P(Rain)
Obviously we would want to generalise this and so we can replace ‘summer’ and ‘rain’ with variable events A and B. For those not familiar with probability notation a slight detour is needed. Please read this to get the basic vocabulary.
So our expression for Bayes becomes the celebrated formula:
P(A|B) = P(A) x P(B|A) / P(B)
The way it is expressed here emphasises the relationship between the posterior and the prior probabilities, P(A|B) and P(A) respectively. This formula shows how knowledge that event B has occurred changes the probability that event A has occurred. We need to know all three terms on the right to compute the value on the left. Critics of Bayes quite rightly point out that this information is not always available – but that is not the point. Often it is, and when it is we can perform a little magic that almost always surprises.
The first application to a business scenario we are going to make is that of recruitment. I’m afraid there is a little more mathematical manipulation we have to plough through, but it shouldn’t be too arduous.
The term P(B) is often not known directly, and to create the full blown expression for Bayes we need to consider how P(B) might be broken down to something which is more general. To get to grips with this we need to consider the diagram below.
The rectangle represents our sample space and the lines divide it into a number of events (sets) E1, E2, E3, … , En. Of course in the diagram there are just four segments, but there could be any number n. The way the sample space has been divided up is known as a partition – because it partitions the sample space with no overlapping areas. Or to use the jargon, E1, E2, … ,En are pairwise disjoint. Without getting too bogged down in the math it should be clear that A can be reconstructed from its intersections with each of these partitioned areas. More specifically we can state that (and remembering we use the & symbol for an intersection):
A = (A&E1) U (A&E2) U … U (A&En)
The ‘U’ is being used as the symbol for union – similar to addition when numbers are considered instead of sets. Because the partition is pairwise disjoint we can represent the probability of A as the sum of the probabilities of the intersections.
P(A) = P(A&E1) + P(A&E2) + … + P(A&En)
Just one more trick. We saw in the previous article that P(A&E1) is equal to P(A|E1)*P(E1). So substituting this back into the equation above we get:
P(A) = P(A|E1) x P(E1) + P(A|E2) x P(E2) + … +P(A|En) x P(En)
This pain is necessary unfortunately to get to the most useful form of Bayes rule. We are going to move a few things around. We will now consider P(Ei|A) as the posterior probability we wish to calculate. There is a rabbit hole here and I don’t want to go down it right now. But when we are updating a probability using Bayes rule we need to know all the possible scenarios that might have happened. In our ‘summer’, ‘rain’ example we were looking to calculate the probability it is summer given that it has rained. The other options are that it might be winter, spring or autumn. The four seasons together make up the whole sample space of days in the year, and we are using the knowledge that it has rained to establish the most likely season. We only calculated the value for summer and found it was pretty unlikely. The important point here is that we are using the evidence given to us to update the probabilities of a number of alternatives, which together should cover all possibilities. So without further ado let’s restate Bayes:
P(Ei|A) = P(Ei) x P(A|Ei) / (P(A|E1)*P(E1) + P(A|E2)*P(E2) + … +P(A|En)*P(En))
This is the general, and most useful formulation for Bayes rule, and it will become apparent why this is so very soon.
Probably a good time to take a break.
OK – here is the business problem we are going to consider, and not only will it provide a surprising insight into the recruitment process, but should help us put the pieces together. Your organisation is looking to recruit someone with very specific IT skills – C++, SAP, SOA, Insurance Industry, Linux – not too many of those around. Prior experience shows you that in a situation like this only one out of every twenty applicants is likely to be suitable. But to compensate for this, the recruitment process in your organisation is top notch and gets it right nine times out of tenĀ – 90% of the time.
We are going to use a decision tree to represent this process, and to clarify the reasoning behind the Bayes rule.
The first split on the tree is whether a candidate is suitable. Of course we don’t actually know which candidates are suitable, just that 19 out of 20 are usually unsuitable. What we do know is that given a suitable candidate there is a 90% chance she or he will be selected. Similarly we know there is a 10% change they will be rejected. The situation reverses for unsuitable candidates – with an acceptance rate of 10% and a rejection rate of 90%. If you have followed so far some bells should be ringing. We’ve just talked about the probability of accepting a candidate given he or she is suitable.
What we want to know is the probability that a candidate is suitable given that they have been selected – this is the only thing we are really interested in. So now I’m going to list the things we know and plug them into Bayes rule – remember we want P(S|A).
We know:
P(A|S) = 0.9
P(R|S) = 0.1
P(A|U) = 0.1
P(R|U) = 0.9
P(S) = 0.05
P(U) = 0.95
Expressed in terms of Bayes formula we transform
P(Ei|A) = P(Ei) x P(A|Ei) / (P(A|E1)*P(E1) + P(A|E2)*P(E2) + … +P(A|En)*P(En))
to
P(S|A) = P(S) x P(A|S) / (P(A|S)xP(S) + P(A|U)xP(U))
Note that S and U partition our sample space.
When we put the numbers in this becomes:
P(S|A) = (0.05 x 0.9)/(0.05 x 0.9 + 0.1 x 0.95) = 0.32
The probability we recruit a suitable candidate is less than one third at 32%. On the face of it this does not seem reasonable in light of the excellent recruitment skills of the organisation. The probability is skewed by the fact there is such a large number of unsuitable candidates, and even though 90% of them get rejected, the 10% that are accepted is a large slice of the acceptance pie. The solution to this problem is simple enough – do a second round of interviews. We know that for this second round there is a 32% chance that a candidate is suitable. Plug the numbers in and the probability of getting a suitable candidate is 81% – somewhat better.
In the next article I’m going to go over this example with a fine tooth comb to map the features of this problem to the Bayes rule.
Phew.