In the last post on linear regression, I showed how the reassurance we might get from a nice, neatly sloping line, can often be misleading. In this post I want to explore something called Simpson’s Paradox, where an up sloping linear regression can be composed of two down sloping linear regressions. While this is interesting in its own right, it shows how easily we can be led to false conclusions.
To illustrate the point I created some data in Excel detailing the yearly sales of products versus the price of each product. As can be seen in the graph below, while the points are widely scattered, nevertheless someone might be tempted to put a linear regression line through the points, and be impressed that yearly sales increase as the price of the product increases (each point representing the yearly sales for a particular product, at a particular price). Management might even conclude that the future of the business lies in selling more up-market products with a higher price tag.
What isn’t evident is that this business actually sells two wholly different product ranges – laptops and cameras say. I’ll just call them high price products (laptops say) and low price products (cameras) in the graphs. So let’s have a look at the same plot, but this time just for the low price items. As the graph shows, there is a negative slope. So annual sales decrease as the price increases. Obviously this is at variance with the first graph that includes all products.
So let’s have a look at the high price items. Once again there is a negative slope on the graph, showing that annual sales, once again, decrease with price. Oh dear, that decision by management to focus in high price products suddenly looks less viable.
Sometimes this behavior can be explained by hidden variables lurking in the data, but sometimes it is not so obvious. In our case, it really isn’t obvious why this should occur (although I’m happy to publish a solution if anyone has got one). In effect we have a positive trend created by two negative trends. Which one do you believe? Well, happily that is not my decision!