We visualize data in charts and graphics because we need information that is not readily available in tables of numbers or written reports. Until recently we have let the computer dictate how information has been presented to us. Computers like numbers and text. They are easily handled, don’t take up much storage space, and require very little processing. It’s a bit like going to a restaurant for a special meal and being served up with potatoes just dug out of the ground and a fish swimming around in a tank. Data visualizations on the other hand require that our computers work hard, and that they are much more skilled at handling data. In effect we are insisting that the computer speaks our language – a visual language.
Human beings are very good at processing visual information. In fact no computer can equal the speed with which we might identify meaning in a visualization. This is actually a two edge sword. Not only do we see patterns in data that are meaningful, but we can also jump to conclusions that have no correspondence with reality at all. Even so, the saying that ‘a picture paints a thousand words (or numbers)’ is very apt.
This desire to get insights from data can be driven by two needs. The first is the need to address some form of uncertainty – to answer a question. Why was sales in a certain region particularly high last quarter? How did various production units perform last month? This has been the dominant use of data visualization until recently, and is synonymous with the role of business intelligence (BI) – looking in the rear view mirror to describe and diagnose historical performance. Of course we can also look at current performance, and dashboards displaying a set of key performance indicators (KPIs) can do this very well. Even so, we are still looking at historical data. Other forms of analytics allow us to get a peek into the future, but that is another story.
The second reason we might visualize data is more concerned with exploration. This allows us to discover facts that we never suspected. That every time we introduce an upgraded product the sales of a related product fall – for example. Data exploration has taken on new significance with the emergence of big data, and the need to find predictive patterns in data. Exploring data allows analysts to find which variables are most important. However data exploration is also important as a means of gaining further insights into how the business works.
Data visualization can also be driven by the need to find quantitative or qualitative facts. Quantitative facts include things such as averages, distributions of values (in a histogram for example), dispersion of values (a standard deviation), maximum and minimum values, quartiles, ratios (how last month’s sales compare with the average) – and so on. These hard numbers are clearly useful, but it is often the case that qualitative insights provide much greater context. A visualization showing sales over the last two years may reveal seasonal tendencies, a trend, or increasing fluctuations. No single number could represent discoveries of this nature. When we start with single numbers and move toward a broader picture, the analysis process is bottom up – from detail to general. When we move from the broad picture to specific numbers, this is a top down approach. Which of these is used depends on the context. If there is a general problem with sales, then the top down approach might be best. If we have spotted a few anomalies, then the bottom up approach is probably going to be best.
Why Now
Two things are driving the use and availability of data visualization tools. There is more data – much more. Cheap storage and proliferation of data sources (social data, transaction data, web site data, online data resources) is one driver – but this is at the ‘problem’ end of the spectrum. More data is problematical unless we have the means to use it. The other driver is the increasing power of hardware. As we have already mentioned, processing visuals is hard work for computers. Powerful CPUs allow computers to handle the hundreds or thousands (and in some cases millions) of data points that might be represented in a data visualization. Of equal importance is the availability of cheap memory and the ability to address it – 64 bit architectures are key to this. Many of the new data visualization tools will extract data from large databases and load the data into the memory of a desktop computer. Gigabytes of such data can be loaded into local computer memory and processed with great speed. The use of columnar databases and database compression also helps matters too. It means users can slice and dice their data with ease. And since processor speeds and memory availability is only going to increase, data visualization tools are set to become much more powerful.
Future Trends
Right now most business users are delighting in the ability to see their data in ways that are truly new and novel. But as the ancients would say – stasis is death. If your visualization arsenal still consists of histograms and bubble charts five years from now, then it is likely that the insights you are gaining are no longer competitive (and what is the use of a data visualization unless it lends some form of competitive insight). A number of data visualization tools already incorporate advanced analytics capability. A good example is clustering – seeing how customer profiles create clusters for example, so that new customers can be offered products they will most likely be interested in buying. There is a large number of advanced analytics methods that will be incorporated into data visualization tools, but ease-of-use will be the factor which decides whether they are used at all. An easy to use interface is essential for business users, who have neither the time nor inclination to understand the intricacies of clustering or logistic regression techniques. This will mean that suppliers of data visualization tools will have to work hard to build intelligence into the user interface. Right now most data visualization tools are effectively dumb, offering very little guidance on what a visualization means. Smart data visualization technology will provide information such as the probability that a feature is meaningful. This will be advisory in nature since, as we have already mentioned, humans are very good at interpreting data visualizations in a way that computers simply cannot.
The other area where analytics tools need to improve is in the preparation of data. Data are no longer exclusively stored in neat rows and columns – aka the relational database. Data from logs, social data, location data, and so on, might be stored in any number of formats. Pulling together data from diverse data sources is becoming more of a headache, and so greater automation needs to be put in place. This is already happening to some extent, with intelligence (machine learning specifically) being embedded into a new genre of data visualization tool. This intelligence supports the automatic joining of data for multiple sources, will identify outliers, suggest transformations – and so on. If data visualization is to become a frictionless activity, much more needs to happen here. The alternative is business analysts and users spending inordinate amounts of time understand what data is actually available, and then figuring out ways to clean, transform and merge it.
Data Visualization Platforms
Chartio – simple to use, ideal for many business data visualization needs. Cloud based service.
ClearStory – from data preparation to data visualization, with automated data prep capabilities.
Datameer – data preparation, big data ecosystem, data visualization, predictive analytics.
Datawatch – handles diverse data sources, real-time data visualizations and dashboards.
GoodData – cloud based BI platform with simple to use data visualization tools.
IBM Cognos – not as easy to use or sophisticated as many other platforms, but part of a broader BI capability.
Inetsoft – powerful data visualization, but not as easy to use as some other products. Free limited functionality version available.
Lavastorm – powerful data preparation, data visualizations, analytical models with runtime environment.
Logi Analytics – excellent guided visual analytics platform with production BI capabilities.
Looker – not particularly easy to use, but supports complex logic and sophisticated visualizations.
Microsoft Power BI – easy to use Power BI desktop for data visualization, collaboration via cloud Power BI Service, powerful when part of Microsoft ecosystem.
Microstrategy – free desktop data visualization platform, and part of a much broader enterprise BI suite.
Platfora – sophisticated data preparation and easy to use data visualization tools. Big data exploration and discovery platform.
Qlik Sense – easy to use data visualization platform, very good data discovery and extensibility.
SAP Lumira – stand alone product somewhat limited, but more useful when used with other SAP products.
SAS Visual Analytics – very easy to use with guided interface and optional advanced analytics.
Sisense – very fast data visualization platform and easy to use visualization tools.
Yellowfin – good all-round BI platform with easy to use data visualizations.
Spotfire – easy to use data visualization platform, but with very advanced analytics capability if needed.
Tableau – relatively easy to use data visualization platform with rich set of charts.