ClearStory Review Summary
ClearStory provides a cloud based platform that takes business users from data to visualization with minimal need for technical skills. In common with several other suppliers (Tamr, Paxata, Platfora and others) it uses machine learning techniques and Apache Spark in-memory processing, to take data from its raw state, to a state where business users can create the data visualizations they need. As data sources become more diverse so businesses are using the Hadoop big data platform as a ‘data lake’, and ClearStory increasingly supports this and many other diverse data sources.
Users are presented with a collaborative environment where data discovery, exploration and visualization efforts can be shared via StoryBoards. These provide one or more visualizations in the context of a story line, and various users can annotate and comment as the StoryBoard evolves.
Terms such as data wrangling, data blending, data curation and governance are increasingly used to describe the effort that is needed to handle complex data, and while we all want easy to use data visualization tools, it is the data preparation, and all that goes with it, that determines the success or otherwise of data visualization. To this end ClearStory is very well positioned for the growing need to handle complex data environments in a data visualization context.
Data Visualization
The users interface provides contemporary drag and drop graphical environment for the creation of large range of visualizations, including maps, charts, graphs, tables and other artifacts. These can be assembled into dashboards, which in turn can be incorporated into a StoryBoard. Context is always maintained, and especially data context, so users can know the profile of data used for a particular visualization. As users select the data they want to visualize ClearStory will suggest visualizations, which can be accepted as-is, or modified as needed.
Data Preparation
It is well known that data preparation consumes the largest amount of time and effort in any data analysis task. ClearStory exploits machine learning methods in a Spark environment to automate much of this effort. This is not a ‘black box’ and users are guided as data is prepared for use. ClearStory will infer what data means (identify date fields, dates, numeric data – and so on), and combine data from diverse sources based on its understanding of the data.
Data sources range from Hadoop through to relational database, files and external data sources. These latter are often used to enrich and augment data, and ClearStory is particularly capable of blending open data sources into an analytical task. Such external sources might include census data, business registrations, field survey, media and market intelligence and macroeconomic data such as GDP growth. A ‘Data You May Like’ library of curated resources means users can direct access to various data sources.
The Competition
Because the scope of ClearStory is so broad it can be considered to have many competitors. On the visualization side it might be compared with products such as Tableau, Qlik Sense and Spotfire. These products however do not embrace the data preparation capabilities of ClearStory. Products such as Tamr and Paxata can be compared for data preparation, but these again do not incorporate much in the way of collaboration and visualization. Perhaps the nearest competition comes from Platfora, which similarly takes users from data to visualization and employs a Spark based machine learning platform for data preparation.