Riding the Big Data Bubble

2619

For people with short attention spans –

Big data is real, but short term it will cause more problems than it solves. The money people are investing in everything with the label big data attached to it and will create an investment bubble. Just as with the dot com boom, many of these businesses will IPO based on inflated claims. Joe Public will invest and eventually the whole thing will crumble – after the big money investors have sold out and made a stash. Big data is a generational thing, like the Internet, and once the bubble has burst we can then get on with something meaningful.

For the rest of us –

Only those people stranded on a desert island in the Pacific will have failed to notice that we are ramping up to a very big tech bubble – the big data bubble to be precise. All bubbles are characterized by ‘irrational exuberance’, as Alan Greenspan would call it, and this one is no different. So let’s look at some of the drivers. Big data is driven by the simple fact that data has become much less expensive to acquire and process. Social data, customer data, web site data, data from devices and other sources, is freely available and begging to be used. These drivers are real, but collecting data for the sake of it really doesn’t make sense – we need to do something with it, and this is the rub.

Big data technologies have been rapidly evolving thanks to open source software (databases, analytical tools, distributed processing platforms) and commodity hardware. It’s all very immature, but this hasn’t inhibited the tech company marketing department from making some very bold statements about the business value of this stuff.

The big data dream goes something like this. Collect lots of data, analyze it to uncover the behavior of customers, suppliers, employees, machines, the photocopier – and anything else that might come to mind. It sounds fine in theory, but the reality of making this work is wholly different. Collecting and managing very large amounts of data is a complex process – big data technologies allow it to happen, but do not ease the complexity (in fact they exacerbate it). It is well know that some of the methods used in big data (map reduce for example) are extremely difficult to use, and very skilled programmers are needed to fulfill even the most trivial tasks. Complexity is a breeding ground for chaos – and as Murphy’s Law states – if something can happen it will happen. So for big data to become feasible, the complexity element has to be reduced and tools made available to manage it. This is happening, but not fast enough. Next we come to the problem of analysis. Analyzing very large data volumes is beset with problems – read Michael Jordan (an expert in these matters) or watch one of his YouTube videos. Businesses will be jumping to all sorts of wrong conclusions based on faulty analysis. Most of this will be hidden from public view, but some disaster stories will leak out. And finally there is a pressing management issue – the management of big data and big analytics activities. Businesses will (and some already are) produce hundreds of analytical models. Who will make sure they are documented, that they are understood by the people who are responsible for them, that they can be easily maintained and modified, and that at any particular time business managers can establish the impact of these models? It’s a huge issue, and one that ignored by all but a few suppliers. The dominant message right now is that anyone with a half decent computer science or math degree can start producing models. And worse than this there is a whole new breed of suppliers selling ‘black boxes’ – put the data in one end and a predictive model drops out the other end. There are grave problems associated with this – but do your own research into over-fitting, data mining bias, random noise and other such topics. Disenchantment will surely follow, simply because it is all so new and we don’t have the skills to do this stuff properly.

None of this will inhibit the investment community from inflating the big data dream. The big money is getting into big data, and since these people do not like to lose money they will inflate the bubble, IPO their investments, and then get out. And just as Joe Public starts investing in already overpriced tech businesses, the whole thing will collapse. At the current time it is truly difficult to keep up with the big data and analytics businesses that seem to be rasing tens and hundreds of millions of dollars on a daily basis. Some of these will go on to become tomorrow’s Oracle and Microsoft – but most won’t.

In summary – the next five to ten years will see the inflation of a huge bubble (possibly bigger than the dot com bubble). Businesses will adopt big data and find it problematical, although the more competent will create significant gains from its use. Hundreds of big data startups will eventually IPO just before the bubble bursts – leaving the ordinary investor out-of-pocket, and the big money investors very much in-pocket. Longer term, big data, big analytics, the Internet-of-Things will change the world we live in, but in the shorter term it’s going to be turbulent. Make sure you’ve got your sealt belt fastened.