Predictive technologies are quite different from the run-of-the-mill information technologies we’ve used in the past. They have something to say about the future, which is intriguing to say the least, but as you will soon see it is also potentially very dangerous. Traditional IT has focused on dumb systems – they store data, process it and maybe pump out a few reports. There is no real value added to the data, other than processing for human consumption. Predictive technologies on the other hand take data and extrapolate into the future. The mechanism for doing this is often called scoring. When a new customer signs up you might want to score them for credit worthiness and predictive analytics might be used to score this metric based on analysis of customers with a similar profile. It all sounds innocent enough.
The central thing to bear in mind with predictive technologies is that they are very powerful – quite unlike anything most of us have used before. This power is a two-edge sword – it can deliver significant benefits, or it can result in wildly inaccurate scoring of new information. What distinguishes these technologies is the level of mathematical sophistication they embrace. Few of the people using them really understand the inner workings of the algorithms. Support vector machines are one of the more powerful techniques used, and unless you are very conversant with linear algebra in infinite dimensional spaces (and who is) then you will be driving blind to some extent. Used correctly the results can be astonishing – used wrongly and you will be led up a very long garden path. In fact this is true for many highly mathematical techniques. The maths is just a framework – a sausage machine. Put rubbish in and you will get rubbish out. The maths will not turn a bunch of meaningless data into a life changing revelation – although it might give pretences to have done such a thing – as I’ll explain shortly.
The central issue with predictive technologies is they use historical data to find patterns, which are then used to score new data. As your investment advisor will tell you – historical performance is no indicator of future performance, and the same is true of most data. This lack of continuity is often called non-stationarity by statisticians. It means the characteristics of your data change with time in way that makes statistical analysis difficult – if not impossible. So all because people with blue eyes have traditionally purchased spectacles with gold rims in the past, does not mean they will continue to do so in the future.
Perhaps the most misleading aspect of using predictive technologies is that they nearly always do what you want them to do – they find patterns. The problem is that these patterns may be nothing more than curve-fitting – finding rules which are a convenient fit to data that is actually random. If your predictive systems tell you that customer complaints increase with the number of cups of coffee people drink in your office, then you would sensibly discount this rule as nothing more than curve fitting – an accidental correlation between two quite separate types of information. On the other hand if your systems say that customer complaints increase with the volume of sales your business makes, should you believe this or not? The statisticians have an answer for everything (they curve fit too) and will tell you that tests for statistical significance generally solve the problem. Sometimes they do and sometimes they don’t – you can never be sure.
The answer to all of these issues is the involvement of people – specifically domain experts. While we may swoon at the latest sexy technology, a more mature approach is to use it in the context of human supervision and intervention. One of the largest suppliers of statistical and predictive technologies ran an advert which said ‘trust the numbers’. A more irresponsible statement I cannot imagine. Please do not trust the numbers. I should add here that I’m an ex-mathematician and not some sort of Luddite. Trusting numbers equals a disaster in the making. Numbers plus a supervising eye equals a low risk way of exploiting these exceptional technologies.
While the use of predictive technologies is alluring, please do not think this is an easy way to extract meaning from your data – it isn’t. It is full of many traps and the people who build your predictive systems should really know what they are doing. It’s a long term project – not just some three month wonder that will transform your fortunes. And black box solutions are perhaps the most dangerous approach of all. As I mentioned earlier the ‘rubbish in and rubbish out’ aspect will materialise just as readily with a black box as with any other type of predictive software.
Used well predictive technologies will add significant value to your business operations, used with a blind faith in the technology and you may end up with the dumbest smart applications in your industry.