It is often said that the only people who made their fortunes in the California gold rush were those selling spades and sieves. There is a striking similarity with the contemporary rush to anything with names such as “analytics”, “predictive analytics”, “machine learning” and so on. The people selling the technology are doing very well thank you, although it is much harder to say whether the users of this technology are also doing well, simply because no one is measuring. Applying technologies such as machine learning to business data is problematical. While machine learning can boast considerable success in static environments (where the nature of the problem does not change), it is beset with problems in dynamic environments, such as those found in most businesses.
Data mining, and its close associate machine learning, use historical data to identify patterns that might be useful in the future. A simple example might be the profile of people who have purchased a particular product. Maybe age, income, education and hobbies identify people who are most likely to look favorably on a product. This profile can then be used to target prospects in the future in the belief that the cost of marketing and sales can be reduced while achieving greater revenue.
It all sounds fairly straightforward, but in reality it is far from simple. Running the various algorithms against data will often result in hundreds, if not thousands, of candidate patterns. Most (and sometimes all) of these patterns will be nothing more than coincidences, random patterns that have no significance or prediction capability at all. To try and weed out which patterns are random flukes we reserve a portion of our data for testing. If a pattern works on the test data, then we can be more confident that it is showing something real. There are also more elaborate versions of testing, but they all depend on a single idea – if a pattern materializes in the data used for learning, and that used for testing, then the pattern is likely to be valid. In reality this is not so. Say we discover ten thousand patterns in our learning data while running algorithms against it. We then run these patterns against the test data and find that a hundred hold up. If we had reserved a second batch of test data we would find that maybe only ten out of the hundred patterns would still hold up – and so it would continue. All we are doing here is reducing viable patterns by placing increasing constraints on them (they have to work in various batches of data which have not been seen before). This whole process involves algorithm selection, algorithm parameter selection, training data selection, test data selection – and so on. It’s a grand form of curve fitting – changing all the variables until the data gives us what we are looking for – and what we are looking for may well be simply misguided.
Let’s be generous and say that the people using machine learning technologies are aware of these issues and have skillfully navigated around them (despite the fact this is not really possible). Most businesses operate in highly competitive environments where opportunities are quickly identified and discounted. The pattern that showed profiles of people who bought one of your products may no longer be effective, particularly if the competition have seen the same pattern. Markets that are efficient (most markets these days) will not accommodate the luxury of simply finding patterns that worked in the past and using them in the future.
These issues not only apply to machine learning, but also to all forms of analytics applied to business – and particularly visual analytics. Visual recognition of patterns (trends, cycles etc) is subjective and riddled with problems. We want to see patterns, and business environments are not exactly impartial. There may be an awful lot at stake for a business manager if a line does not slope upwards.
The net result of all this is that using analytics technologies effectively is difficult, while the vendors of spades and sieves want us to believe it is easy. A recent article in the Guardian likened machine learning to witchcraft – an unfounded belief that it can work magic.
So that this does not sound like naysaying, I should add that I know of some very effective machine learning solutions in business, but such solutions are not easily found, and most important of all, the problem has to be well understood and defined – something a machine cannot do (yet).