How to Exploit Complex Data



Article sponsored by SisenseSisenseLogow

Businesses are gaining insights from data that simply were not possible just a few years ago. This capability is powered by a new breed of data visualisation tools, and the ability to morph complex data into a form that is suitable for analytical purposes. While most of the focus tends to rest on the visualisation aspects, it is the ability to prepare complex data from multiple sources that facilitates deeper insights into business operations and opportunities.

Most businesses possess a large inventory of data sources. Transactional data from operational activities tends to be buried in the associated applications, but new sources such as social data, web click-stream data, online data sources, location data and others, provide new dimensions in analysis that give much deeper insights into customer behaviour and market opportunities. Of course with this proliferation of data sources comes increased complexity, and it is essential that analytical tools are capable of mapping this complexity onto a simple view for analytical purposes. It is already the case that data preparation tasks account for anywhere between 50% and 80% of any analytical process, and proliferating, diverse data sources only serve to increase this fraction. Business intelligence (BI) platforms that reduce this overhead are not only desirable, but absolutely essential if analytics is to move at the speed of the business.

The traditional approach to the preparation of data for analytical purposes usually involves the creation of a data warehouse and/or data marts. These are pre-configured for a well defined set of analytical tasks, but are high latency, relatively inflexible and expensive. These and other approaches were based on the ‘schema at store’ approach to data. Data could only be handled if it fitted into a previously configured data format within the data warehouse. Today this is no longer necessary. BI platforms are available that support ‘schema on read’ – in other words the layout and format of data is created on-the-fly to meet ad-hoc needs.

This new capability is enabled by new types of database management system (particularly columnar databases), greater memory (RAM), and much more powerful processors. In one case this exploitation of contemporary hardware has been taken to an extreme by the clever use of very fast memory embedded in the actual processor. The net result is that users can wrangle their data resources into a usable format as and when they need to. For example, in-memory databases are utilised by Tableau, Qlik and Spotfire. Sisense uniquely exploits in-chip memory for very fast processing.

If the new sources of data (operational logs for example) were formatted in the familiar rows of relational databases, the data preparation task would be much simpler. However the new sources of data tend to be held in high performance data stores, and in a format that is suitable for near real-time demands. These can impose additional overheads when data is accessed, because they have to be ‘decoded’ and essentially flattened out into the familiar row format. It is essential that a BI platform can perform this and other forms of conversion, as and when needed. Once this has been accomplished it is also useful for data to be joined where appropriate, and for complex transformations to be applied (involved financial calculations for example). Tools which assist in this process can reduce the data preparation time considerably.

The net result of exploiting the power of contemporary hardware architectures to wrangle data on-the-fly, is transparent access to, and the merging of data from widely diverse sources to give new business insights. Provided a BI platform provides the necessary governance, authorisation and security, business users can respond to urgent business needs with very low latency (often measured in minutes), and with a depth of insight that simply was not possible before. The ongoing evolution of computer hardware, database technology and data visualisation platforms will be absolutely necessary to deal with the rapid growth in the volume, diversity and speed of new data sources – including ‘big data’. Dealing with data complexity should be high on the list of priorities as businesses select and deploy their BI platforms.

Sisense offers a free trial.