Big Data Visualization Tools 2015
Big data visualization tools rest on their ability to connect efficiently with big data databases, and manage the data in a useable format. The visualization tools themselves typically facilitate the creation of charts and dashboards, but some offer more advanced forms of analytics such as predictive analytics and statistical analysis. The list of big data visualization tools below offer diverse solutions to the visualization need. Lumify and ZoomData are particularly interesting platforms, and Spotfire, Pentaho and 1010Data provide more advanced analytics capability. SAS, IBM, Oracle and SAP have not been included since these are usually only of interest to users with a large investment in these technologies.
1010Data is a complete cloud based big data visualization and analytics solution (an Analytics Platform as a Service – APaaS). It uses a columnar database for high speed query, reporting and analytics needs, and provides a number of interfaces for data analysis. The most novel is the ‘Trillion Row Spreadsheet’ which operates in the web browser, and supports data discovery and analytics in a spreadsheet environment. The platform includes a powerful array of advanced, built-in analytic functions including: Statistics (distribution analysis, correlation, variance), Predictive modeling and forecasting (linear and multivariate regression, logistic regression) and Machine learning.
Datameer is an end-to-end big data discovery and analytics platform, purpose built for Hadoop. The data integration capability means data from diverse sources (Excel, text, databases) can be loaded into Hadoop, either as a one-off, or continuously streamed. The lack of any pre-defined schemas means data can be accessed in ad-hoc manners, and some 250 analytic functions accommodate most forms of analytics (clustering, decision trees and apps such as a recommendation engine). Dashboards and charts are built using the Business Infographic Designer, and resulting reports and graphics update when data changes in the Hadoop database. Visualizations are generated in HTML5, and so they can be viewed on almost any device. Datameer also offers an integration with Tableau, if that is the preferred visualization platform.
GoodData is a cloud SaaS analytics platform which connects to a large number of data sources – big data, databases, online apps, social data, and so on. It provides simple mechanisms to create charts and dashboards, and is extensible through its ability to run analytic functions written in R directly within the database. The multi-tenant, multi-layered architecture scales independently for each customer at each layer of the architecture which spans hundreds of servers, tens of terabytes of RAM and real-time in-memory analytics in GoodData’s private cloud. The platform features a dual-stage Data Storage Service (DSS) that stores raw data in a Hadoop-based system as well as a highly-scalable, clustered columnar warehouse for enriched, analysis-ready data.
Lumify is an open source project to create a big data fusion, analysis, and visualization platform designed for anyone to use. Its intuitive web-based interface helps users discover connections and explore relationships in their data via a suite of analytic options, including 2D and 3D graph visualizations, full-text faceted search, dynamic histograms, interactive geographic maps, and collaborative workspaces shared in real-time. As an open source project, Lumify features are evolving all the time. Below is an explanation of the primary ones.
- Search – Lumify provides a full-text search over everything in your graph. You can also use custom filters built from properties defined in your ontology.
- Graph Visualization – The primary feature of Lumify is the graph visualization. Lumify provides both 2D and 3D graph visualizations with a variety of automatic layouts.
- Link Analysis – Lumify provides a variety of options for analyzing the links between entities on the graph. A right click menu on any entity allows you to display all related entities, find paths to another entity, and establish a new relationship (i.e. connect) to another entity.
- Geospatial Analysis – Lumify provides the ability to integrate any Open Layers-compatible mapping system, such as Google Maps or ESRI, for geospatial analysis. Any data tagged with location information can be aggregated and viewed on a map.
- Multimedia Analysis – Out of the box, Lumify comes with specific ingest processing and interface elements for textual content, images, and videos.
- Collaboration – Lumify’s spaces feature allows users to organize work into a set of projects, or workspaces. Each space can be individually shared in read-only or edit mode with other Lumify users. Changes to the space are immediately propagated to everyone sharing the workspace without needing to refresh browser windows.
Pentaho comes with native connectivity to a wide range of NoSQL and big data databases (mongoDB, Cassandra,Datastax, HBase, ElasticSearch), as well as more conventional data sources. The Pentaho adaptive big data layer, which takes advantage of the specific and unique capabilities of each source. It supports a wide range of interactive visualizations with drill through, lasso filtering, zooming, and attribute highlighting. The analytics is particularly sophisticated with powerful algorithms such as classification, regression, clustering and association and algorithms can be incorporated into Pentaho’s visual interface.
Qlik provides two analytics products – production analytics QlikView and self-service analytics QlikSense. The latter is the newer product and comes with the associative data engine that makes ad-hoc data visualization so productive. This engine understands the relationships between various data sources, and as users explore their data, it suggests links with data that might not have been previously considered. The building of charts and dashboards is quite straightforward and the desktop version can be downloaded for free – and it’s fully functional. A cloud hosted version is also available.
Sisense majors on the speed of its technology and an in memory column database is used to integrate data from diverse sources. Various hardware and system software tricks are used to enhance performance and provide a unified picture of data. Charts, reports and dashboards are easily created and users find the interface easy-to-use and productive. Resulting visuals and specifically dashboards can be shared and are fully interactive. Mobile devices are supported and users can set up alerts to view changes as they occur. Sisense also majors on embedded analytics where charts and dashboards can be used within production applications. Connectors are available for big data sources, online data and databases.
Spotfire supports easy to build data visualization, text analytics, predictive analytics and statistical analysis. A large number of big data sources are supported and the clever use of memory means most visualizations execute very quickly. For more modest data sets data is held in memory, larger data sets can be processed in database, and a hybrid mode pulls data into memory as needed. Dashboards and charts are easy to build with recommended visualizations based on the data being processed. Spotfire comes as a stand-alone Windows desktop, as server based Spotfire Platform supporting web authoring and viewing of visualizations, and as Spotfire online – hosted in the cloud.
Tableau is well known for its ease-of-use and attractive visuals. It comes in four versions – Desktop Tableau runs on Windows and Mac and processes data in memory for modest data sets, or connects directly to data sources for larger data sets. Tableau Server comes with community and visualization sharing capability, and a managed environment for departments and enterprise use. Tableau Online is a cloud hosted version of Tableau Server, and Tableau Public supports the creation of charts which can be embedded into web sites. Tableau is a pure data visualization platform with limited analytics, unless users wish to use the the R integration facility.
ZoomData connects to most data sources – big data, database, text data, and so on. It also supports in memory processing in the form of Spark and Databricks. The connectors are a real feature, and the speed of processing is in part due to native connectors to Cloudera Impala, Amazon Redshift and MongoDB. Its micro-query facility means users do not have to wait for a large query to complete, the visualizations are built incrementally as data are processed.The Visualization Studio uses various JavaScript libraries (D3, Leaflet, NVD3) and users can include other libraries if needed. Text search is particularly sophisticated with faceted search and filters (Elastic Search and Solr are supported). It supports the creation of complex dashboards and the embedding of visualizations into production applications.