As with all information technologies data mining benefits offer an opportunity to increase the efficiency and effectiveness of an organisation. The core idea behind data mining is that through the use of appropriate technologies we can identify patterns of behaviour, in customers, employees, suppliers, machinery and in fact any aspect of the organisation provided data has been captured. These patterns would then allow us to improve understanding of processes and in some instances predict the outcome of a situation. Obviously this has great appeal and the suppliers of data mining technology are not shy when it comes to advertising the potential benefits. But as anyone who has been involved with information technology in a large organisation for more than two weeks will tell you – all that glitters is not gold.
The costs associated with data mining are as small or large as you want them to be. At a minimum you might download a free data mining tool (RapidMiner for example) and with the necessary skills start mining data. Patterns will be found – this is guaranteed. Whether they are useful or accurate is another matter completely, and this is where most of the risk creeps in. The same issues materialise if you spend hundreds of thousands or millions of dollars on your data mining technology. You may be able to process larger data sets with greater speed, and present very pretty visualisations – but the problems of usefulness and accuracy still remain.
If your organisation can afford to employ a room of PhD statisticians, data mining experts, mathematicians and domain experts then the risks are going to reduce. Although it should be remembered that this is precisely the sort of team employed by the large banks prior to the 2008 credit crisis – largely created by complex, but inappropriate modelling of derivatives. Even so, such a team of people should easily spot the ghosts in your data that have no reality in your business. So provided you spend wisely the risks can be reduced by hiring experts. But these people are not cheap. Their salaries are often in excess of a hundred thousand dollars a year, and they are quite difficult to find.
The opposite end of the extreme is to use a plug and play data mining tools and blindly accept its findings. Please do not do this – things will almost certainly go badly wrong. The patterns that are found may be no more than apparitions in your data with no existence in reality, and to act on them may be costly. There are ways to minimise these apparitions, but they are quite technical, and even then, not foolproof.
The key to using data mining technologies successfully is people, and particularly people who understand the domain where the technologies are being used. There is a famous story reported in the Wall Street Journal of attempts to use data mining in financial markets. It was found that US stock returns could be predicted with 99% accuracy if US cheese production and the total population of sheep in the US and Bangladesh were used as inputs. Clearly this is nonsense, but this is what data mining came up with. A domain expert (someone who knows about financial markets) knows that this is nonsense.
A strongly supervised data mining initiative (supervised by domain experts) has many benefits, but ultimately the benefits have to exceed the costs and be worth the risks. One of the most common uses of data mining is in sales and marketing. Market basket analysis is widely used to establish buying habits of customers, typically in a retail scenario. Prospecting, estimating response rates, fine tuning messages and so on are all fair game for data mining. And just as data mining does present real risks, it also presents the opportunity to significantly improve the fortunes of an organisation.
Ultimately data mining is all about uncovering information, and someone in the organisation needs to be ensuring that the costs of unearthing this information are smaller than the benefits it delivers.