There are only two reasons we need data. The first is the satisfaction of regulatory requirements, and the second is the need to address business questions and decisions on an ongoing basis. These questions and decisions arise because the world is uncertain – if it wasn’t we could load a schedule into our computers and retire to the beach, or golf course, or whatever your opium happens to be. And so this leads to an interesting question – what comes first, the question or the data. Well, obviously it’s the question. So why all the fuss about holding petabytes of data, when the thing that really matters is the decisions we need to make?
I was prompted to write this article because of a comment made by Stuart Wells, executive vice president of products and technology, and chief technology officer at FICO. He said, “Many Big Data deployments have failed to deliver competitive advantage because their approach is completely backwards, we focus on decisions-first, as opposed to data-first.” And indeed, decisions do come first.
You won’t find much on this web site about ‘big data’ – as far as I was concerned it was just plumbing, and another excellent example of how technology investment and business need can be almost totally divorced. In many situations it really will not make much difference at all if the data you use for analysis is a couple of gigabytes, instead of terabytes. In fact ten thousand data instances might tell you just as much as a billion in many cases. There are of course situations where a billion rows of data might convey more information – but they are surprisingly uncommon. The ‘big data’ feeding frenzy was just a very good marketing campaign by large suppliers with enough marketing budget to make a difference. It allowed them to shift more hardware and specialized software. But just as Hadoop was flavor of the month a couple of years ago, and Spark is now, so our fascination with data will subside and hopefully give way to the real issue – how do we make better decisions at operational, tactical and strategic levels. And once we have come to some understanding of this central issue, then we can decide whether we need petabytes of noise filled data.
At a slightly more abstract level, almost everything that is happening with technology right now is aimed at helping businesses deal with uncertainty. Who are the best targets for a marketing campaign? Which deals are the ones most likely to close in the current period? How to organize production of various items so profit is maximized? … and on and on. These are the real issues, and not how big your data lake is.
Data is a bit like electricity. You need it to do things. But ultimately it’s what you do that matters, and not the amperes or gigabytes surging down their respective circuits. Decision automation is where we are headed, and it is why AI, machine learning and associated technologies are so important. Data is simply a necessary, but very insufficient commodity.