On the face of it big data analytics simply requires data, analytics tools and some means of deploying analytical models into the operational environment. Such a simple minded approach would be a sure route to ruin, since there are many skills, resources and methods which need to work together if big analytics are to deliver their promise.
Firstly there are considerations of the data itself. Big data is no different from any other data in many respects. It needs to be understood and pre-processed before it can be used in analytics activities. This means that business users, analysts and data scientists must have the tools which support data exploration and visualization, while hiding the complexities associated with big data. Understanding the meaning behind the data is absolutely crucial if we are to avoid deploying meaningless predictive models which are no more than ‘accidents’ of the data. However big data does present some unique challenges, and particularly data which are characterized by many attributes. There is a common misconception that large volumes of data cannot lie and mislead. They can, and they do it in ways which require skill and experience to detect. While big data may be relatively new, there is no substitute for experience.
There is now a plethora of analytics tools, and particularly predictive analytics, which provide stand-alone model development environments. For modest analytical activities these may be sufficient, but for enterprise needs they will inevitably be deficient in management, integration and deployment capabilities. Hundreds and possibly thousands of predictive models are typically the norm in organizations which automate their decision processes. Documentation, visibility, monitoring and maintenance of these models requires a decision management infrastructure. The alternative is a rapidly decaying ability to manage the decisions that predictive models are making. Model building also needs to be integrated with model validation, monitoring and modification, and so an integrated design, build, validation and monitoring environment is needed. Anything less and trouble will surely follow.
The reality of most predictive models is that they will be deployed into mainstream production systems – sales, finance, marketing, purchasing and most other activities relevant to a given organization. Since the processes associated with these activities are often created, deployed and monitored using business process management (BPM) techniques and infrastructure, it makes sense to integrate decision automating predictive models with the BPM environment. To this end a recently ratified standard known as Decision Model and Notation (DMN) has emerged to link mainstream business processes with the decision models they use. This is absolutely essential if widespread adoption of decision enhancing technologies is to be successful. Anything less and we end up with the islands of information and automation that plagued business applications prior to the adoption of integrated application suites (ERP and CRM for example).
There are also different mechanisms for deployment which must be considered. Many models can be represented in a manner (typically decision tables and trees) which are easily deployed in a business rules management system (BRMS). This has many advantages, and not least the visibility to business users, ease of modification and all the benefits associated with a central repository. Other models cannot be represented in this format, and so it is crucial that the decision management environment supports a unified view of all decision models deployed and under development within the organization.
Finally, not all models are predictive in nature. Some are needed to optimize the use of resources, and it is inevitable that other methods of optimizing and predicting outcomes will emerge. These can often be used together and this is only feasible in an integrated environment.
Despite the hype, big data is very similar to ‘old data’ in many ways, although it does present some new challenges. We shouldn’t be carried away by the notion that more data is necessarily better data – only someone who really understands the business domain can say whether that is the case. However the availability of large volumes of diverse, low cost data does present many new analytical opportunities to create optimized decision models. Many businesses in diverse industries are realizing the benefits of big data analytics, and those which are most successful are the ones who have chosen to take an integrated approach based on decision management methods, technologies, skills and disciplines.
The previous article in this series is Big Analytics Technologies
The next article in this series is Big Analytics Applications