Big data is driven by the rapidly falling cost of acquiring, managing and processing ever increasing amounts of data. Clearly we shouldn’t be accumulating more data simply because it is possible to do so, and the whole big data concept is predicated on the notion that we can derive value from our data assets. Some data has to be preserved for regulatory reasons, and all of it is fuel for analytics activities, so that businesses can make better informed and more effective decisions at all levels.
While we have mostly been impressed with big data infrastructure technologies, these are in reality not much more than plumbing – bigger tanks, fatter pipes, more of them, and different shapes. Commodity hardware and open source software have been the prime drivers for reduced data storage and processing costs. While traditional data management systems; relational database management systems (RDBMS) in the main, have handled transactional data very effectively, they are not so good at handling streaming data (financial market data streams for example), documents and text, images and other more esoteric data types such as geospatial data. RDBMS are not particularly good at handling queries and reporting needs either, and this has been mitigated to some extent by the emergence of data warehouse technologies, where data are formatted in such a way to make reporting activities more speedy and flexible.
As a result of these limitations a wide variety of new data storage techniques have emerged, and distributed database architectures have evolved to handle vastly greater volumes of data – the most well-known being the Hadoop ecosystem of technologies.
The concept of big data has become prominent because of the synchronicity of greatly enhanced technical capability and the reduced cost of data acquisition. This latter is driven largely by data generated through agents external to the organization, primarily customers and social data, and the rapidly evolving Internet of Things (IoT), where devices generate data at a very low cost (mobile and medical devices for example).
So the scene is set for the acquisition of massive amounts of data of various types, and for their processing at speeds which have previously been prohibitively expensive. This is a necessary but hardly sufficient scenario for using big data technologies. We need to add analytics to create value from an otherwise impotent resource. Analytical activities can be broadly divided into those which are concerned with historical and current activity, and those which predict future behaviors. Business intelligence (otherwise known as descriptive analytics) is a well-established set of methods and technologies which allows users to look through the rear-view mirror and analyze historical performance. Predictive analytics and prescriptive analytics allow businesses to learn from the past and predict future performance. The former uses data mining technologies to detect patterns in historical data which can be used in future activities, and prescriptive analytics determines the best use of resources given a set of constraints and objectives. These are complemented by business rules technologies and methods, which support the creation and maintenance of possibly thousands of rules which apply to operational activity within the business.
And so big data, as it has become known over the last few years, is largely an infrastructure issue. This infrastructure exists so we can manipulate and exploit low cost data resources, to drive high value analytical models, which improve both the efficiency and efficacy of business operations. To this end we need another layer of technologies, methods and skills to manage analytical activities, which typically produce thousands of decision enhancing, analytical models. And we should not forget the central roles of business know-how and analytical skills, since without them the whole edifice is of little value.
Such is the impact of big data analytics that it is useful to separate out the traditional use of information technology as a means of automating transactional activity and business processes, from the emerging use of technology as a means of automating business decisions – typically at the operational level, but also providing input to tactical and strategic decision making. Just as businesses differentiated themselves by the efficiencies they realized through transaction and process automation, so there is a new opportunity to differentiate through the efficiencies and improved effectiveness created through decision automation and management. Big data analytics is the hub around which much of this new opportunity will revolve.
The next article in this series is Big Analytics Technologies