Summary – Use Association Rule Mining (ARM) when you want to discover rules relating attribute values to each other. The general form of a rule is ‘IF this THEN that’, and in a supermarket shopping habits analysis an example might be IF milk THEN bread. Establishing the usefulness of rules is a major part of most ARM projects.
On the face of it association rule mining (ARM) is a simple enough activity. We simply let loose the ARM algorithms on the data of our choice and a plethora of rules will be found. The general format for these rules is as follows:
IF this THEN that
To add some meat to the bones let’s consider the habits of shoppers at a supermarket. A specific rule might say:
IF (bread and milk) THEN (Jam)
The ARM algorithm will have plowed through millions of combinations of products to identify this particular rule and thousands of others. In fact this is one of the problems associated with ARM, it finds an excess of rules, many of which are simply not useful.
ARM is generally categorized as an unsupervised learning method. Supervised learning involves giving the data mining algorithms a target variable to classify (a good or bad loan prospect for example), and a data set to learn from. Unsupervised learning does not involve a target variable and is more exploratory in nature.
In the supermarket example the ARM algorithm will typically produce thousands of rules of the form mentioned above. Another one might be IF (cheese and apples) THEN (beef and oranges). When we look at the data we might find that the number of people buying cheese and apples together is quite small, and that only 100 cases out of millions of shopping baskets obeyed this rule – clearly it doesn’t mean very much.
So that we are delivered a meaningful set of rules, there are a number of measures that can be used to filter out the more trivial ones. The use of such measures is a major part of using ARM and so we will give it some attention. This involves the use of a Venn diagram, just to make things easier to understand.
In the diagram above we have three basic shapes. The blue rectangle represents all the shopping baskets we wish to consider, and this has a total of Nt shopping basket details The yellow circle represents all the shopping baskets with the ‘this’ combination of goods. The green circle representing all the ‘that’ combination of goods, and the intersection of the two circles represents out rule. Going back to our basic rule formulation we can rephrase it as:
IF left THEN right
Both ‘left’ and ‘right’ represent combinations of items, and are labeled such because they appear on the left and right hand side of the rule. So Nl is the number of instances (shopping baskets) that satisfy the ‘left’ combination of items (cheese and apples say). Nr is the number of instances that satisfy the ‘right’ combination of items (beef and oranges). The items which satisfy the actual rule are represented by the intersection of the two circles and number Nb. So having got the terminology out the way we can define three central measures used to filter useful rules.
The first is Confidence and this is calculated as Nb/Nl. This is a measure of the number of baskets that satisfy the rule as a fraction of baskets that satisfy the left hand combination. To make this meaningful we will go back to our first rule:
IF (bread and milk) THEN (Jam)
Let’s say there are 500,000 shopping baskets that contained bread and milk (Nl) out of a total of 1,000,000 shopping baskets (Nt). And there are 300,000 shopping baskets that contained jam (Nr). The number of shopping baskets which contained bread, milk and jam is 200,000 (Nb). Confidence for this rule would be 200,000 / 500,000 = 0.4 or 40%. Confidence gives a measure of how exclusive a rule is. In this case 60% of the bread and milk purchases were associated with other items. Confidence should be much higher than this – around 75%. So knowing someone purchased bread and milk does not really tell us all that much.
Support is specified as Nb/Nt. This gives us some measure of how ‘big’ this rule is. Typically users of ARM methods want a support of greater than 1%. A rule really doesn’t mean very much if there are only five instances of it out of 1,000,000. In our example support is calculated as 200,000 / 1,000,000 = 0.2 or 20% – a very respectable number.
Completeness is specified by Nb/Nr. This tells us whether the rule predicts a substantial number of the instances where the ‘right’ combination is present. If it only predicts a small fraction then it would not be much use in promoting those products on the ‘right’. The completeness of our rule is 200,000 / 300,000 or 0.67 or 67% – once again very acceptable. There are many other measures of a rule, but these are the most common. We will look at others in subsequent articles.
ARM is used for many purposes other than analyzing shopping baskets. It can be used to analyze credit card purchases, medical symptoms and other uses. In the next article we will get more into the algorithms used for ARM with their relative strengths and weaknesses.