Platfora Review Summary
Platfora provides an end-to-end big data data discovery and exploration platform that starts at data ingestion and ends with visualisation. In many ways it is the right product at the right time. Had Platfora tried to deliver its platform just four or five years ago, it would have been shooting at a moving target, since big data technologies were very immature. Today however the growing use and acceptance of Spark in-memory processing, in addition to a maturing of Hadoop, means that Platfora can deliver massively scalable data exploration and discovery tools that overcome many of the problems associated with traditional data warehousing platforms. These are built with predefined needs in mind, using limited data sets, and so are inflexible and slow to design, build and use. Three month latency between need and capability is common with these platforms.
Platfora uses the Hadoop distributed file system (HDFS) as a data store, ingesting data from transaction based systems, devices, external data feeds, and so on. Data is catalogued and prepared using Spark machine learning and in-memory processing. What this means in practice is that users get to see the connections between data sources, and data that has been prepared for analysis.
The Platfora architecture sits on top of the Hadoop platform and provides scalable in-memory processing to handle large queries at speed. Much data preparation is handled automatically as data is taken from its raw state and moved to a structured columnar database within Platfora. Users get to see samples of the prepared data and can override faulty interpretations. Platfora can then create ‘Lenses’ or data marts on an ad-hoc basis as needed – which is of course the silver bullet many business analysts are looking for.
Given a suitable lens, Platfora can then be used to visualise data as required. To this end the Platfora Vizboard engine supports visualisation of millions of data points, with the zooming, drill down, panning and filtering functions typical of a product of this nature.
This is an integrated platform that largely avoids having to fiddle around with MapReduce, Hive, Pig, or any other time consuming and complex technologies. It largely delivers on the promise of big data – ad-hoc access to very large amounts of data, with minimum latency, so that business users can explore, and discover what the data is saying.
The visualisations in Platfora (Vizboards) are rendered in HTML5, and so can be viewed in any browser. However unlike many data visualisation platforms Vizboards can handle millions of data points. Many different types of visualisation are supported, including maps, charts, graphs, dials and tables. The interface is primarily drag and drop with filtering, sorting, grouping and drill down functionality.
Key to the ease of visualisation is the Platfora catalog. It is here that data resources are listed and documented, making it straight forward for users to access and understand the data they need.
The architecture of Platfora is refreshingly straight forward. As can be seen in the diagram below it primarily consists of just four layers – the Hadoop HDFS platform for data storage, the Spark based data preparation layer, the in-memory architecture for creating data marts (or lenses as Platfora calls them), and finally the data visualisation platform. A catalog provides a semantic view of information, and an API allows external access.
Deployment can be cloud based or on-premises. The Platfora servers are separate to, but live alongside the Hadoop platform. In the cloud, and specifically Amazon Web Services, Platfora uses Amazon Simple Storage Service (S3) to access the raw data and uses Amazon Elastic MapReduce (EMR) to run its data processing jobs (lens builds). The results of the lens build jobs are also written back to S3.
The Lens Builder (re data mart builder) sits over Hadoop and translates requests to prepare data for analysis from the Platfora application into a series of custom Spark and MapReduce jobs. These are submitted to the YARN Resource Manager or Hadoop Job Tracker for execution. Once the data is extracted and transformed within Hadoop, the job results are written back to the Hadoop file system as Platfora lens. The lens is a Platfora columnar file format for storing high performance in-memory extracts of data with analysis applied.
There is much more to the Platfora architecture, but this is probably enough for a short review. It is a highly innovative solution to the decades old problem of business users getting access to data, which in its raw form is not particularly suited to analysis.