Business Intelligence 2016
For almost as long as I can remember various pundits have been forecasting BI trends – and guess what, the most accurate forecast they could have made would be that nothing was going to change. Just three or four years ago BI output largely consisted of paginated reports, the latency between need and fulfillment was unbearable, combining data sources was tortuous, and visual appeal was almost non-existent. Happily this is no longer the case, and so it really is worthwhile looking at how BI will evolved over the next few years.
For the incurably impatient here is a list of significant developments in BI that will materialize over the coming few years:
- Smart BI – you will no longer be on your own when interpreting what a chart or graph means.
- Advanced Analytics – easy to use advanced analytical techniques such as clustering and regression. Users want more than charts and graphs.
- Taming data complexity – lots of data sources and different types of data create complexity. BI tools will increasingly handle this complexity with varying degrees of automation.
- Speed – more memory and faster processors mean more data can be processed in-memory, and in extreme cases the processor memory itself can be utilized for very fast processing.
- Embedded BI – becomes easier and is the real route to pervasive BI. Users get their BI embedded into applications they use every day.
- Greater governance – to reduce the ‘islands of information’ problem and create a single version of the truth.
Contemporary BI platforms allow users to visualize data in a bewildering number of ways. However, once the data has been processed and formatted for display the user is left on their own to establish meaning. Interpretation requires domain knowledge, and some understanding of what a visualization is saying. Because all data contains significant amounts of random noise it is just too easy to see meaning where there is none. And so users need assistance – guidance on what is significant and what isn’t. Some BI tools already offer some rudimentary metrics – the favorite being a p-value, indicating the likelihood that a feature might just be created by random circumstances. Much more is needed here, and will be demanded as users increase in skill and experience.
Easy to Use Advanced Analytics
BI has moved on from the production of paginated reports to the display of data in a large number of rich formats. This trend is continuing with the introduction of more advanced forms of analytics. However these will remain shelf-ware unless they are easy-to-use. Data clustering is a prime candidate, allowing users to see how their data clusters, and hence segment data in ways that are meaningful. The underlying clustering mechanisms do not have to be made visible (the most common being k-means). Logistic regression is another mechanism for categorizing data, and again, an easy-to-use interface makes this powerful technique available to business users. Some suppliers already offer these capabilities, and more will join the club.
Taming Data Complexity
A large business will typically process hundreds or thousands of different types of data. These will he held in databases, spreadsheets, files, online applications and cloud services. The trend is toward more data and data types, and of course this creates greater complexity. Not only are we processing numbers and short text fields, conveniently stored in relational databases, but we now have to process text data (documents, emails, social data etc.), hierarchically structured data (JSON and XML), location data – and so on. As far as a BI users is concerned, the bringing together of these diverse data types is where the real juice is to be found. As such we need BI platforms that simplify and speed the data ingestion, profiling, transformation and joining processes. This is already happening, and several suppliers utilize machine learning to automatically prepare data. This has to become mainstream or we will spend an increasing fraction of our time preparing data, and this is already estimated to be between 50% and 80%.
Computer hardware is the Cinderella in all of this. Without the fast processors and large amounts of memory we would still be in the BI dark ages (data-warehouses and tabular reports). All contemporary BI platforms exploit in-memory processing, and rich data visualizations depend on fast processors. All BI platforms exploit hardware architectures to a greater or lesser degree, and in the extreme will actually exploit the processor chip architecture for on-chip processing. This abundance of computing power means more advanced, and compute intensive forms of analytics will become commonly used.
There is a common misconception that pervasive BI equals large numbers of users slicing and dicing data using stand-alone BI platforms. Nothing could be further from the truth – and besides, very few people are afforded the luxury of casually exploring and visualizing data for most of the working day. Pervasive BI is synonymous with embedded BI – embedded into the very applications that people use on a day-to-day basis. Once we’ve got over the current data visualization fascination, we will production line BI by embedding it.
The see-saw movement between centralization and distribution of BI resources has seen a recent extreme in the distribution of BI. This has led to concerns over data security, ambiguity, poor optimization of resources and the creation of islands of information. The inevitable movement over the next few years is toward greater centralized control – otherwise known as governance. Users in the main dislike having external controls, but unless an organization wants to descend into BI chaos these controls are needed. In practice this means stronger authorization, more data security, better optimization of computing resources and a rationalization of the use of BI tools.
In the longer term BI will embrace many other forms of analysis. Today it is synonymous with reports, charts, dashboards and multidimensional analysis. The analytic arsenal is much greater than this and includes various forms of classification, prediction, statistical analysis, rules discovery and so on. The suppliers of BI platforms are tasked with the job of making these forms of analysis available in a user friendly format. After all, we are not expected to understand the algorithms that draw a scatterplot, and we shouldn’t be expected to understand how a logistic regression works for it to classify our data.
The distinction between other forms of analytics (predictive analytics for example) and BI will diminish.