KNIME Review Summary
KNIME offers an open source visual workbench for various forms of data analysis, including visualization, profiling, data transformation, reporting, data mining and the creation of predictive models. Its industrial strength is well demonstrated by its interfaces to, and embedding in a number of commercial big data and data visualization platforms. Actian, the big data company, embed KNIME into their product architecture, and third party interfaces exist to the Spotfire data visualization platform and Pervasive, the analytics platform (now part of Actian). Dymatrix also offers extensions for KNIME which support real-time model training and scoring for any predictive modeling algorithm.
Built on the Eclipse IDE, KNIME is extendable, and several free extensions are available for molecular and chemical sciences. Skilled users can also build their own extensions, although with hundreds of nodes to process data, model and analyze, this is typically not needed. In common with many other platforms (RapidMiner for example) KNIME embeds all the Weka data mining functionality, and provides a plug-in which allow R scripts to be run.
As a stand-alone workbench KNIME is free to download and use. It does however come with a variety of commercial add-ons which support collaborative work in an enterprise environment. These include:
- KNIME Personal Productivity – supports the management of metanodes (a collection of nodes that perform a particular task). The LocalSpace Repository provides a repository for registering metanodes, which can the be incorporated into other workflows via drag and drop.
- KNIME Partner Productivity – locks and encrypts metanodes so that the internals cannot be seen. This obviously necessary in some commercial environments where IP needs to be protected.
- KNIME TeamSpace – allows collaborative sharing, with a shared workflow repository, shared data space, and shared metanodes.
- KNIME Server Lite – adds the ability to define user access rights, and allows workflows to be scheduled for execution.
- KNIME Server – provides an enterprise environment for the distribution of workflows so that authorized users can access and execute them as necessary. Workflows can be integrated into other applications using web services, and they can also be scheduled for execution. A web portal gives access to workflows, with versioning control, a shared workflow repository, shared data spaces, shared metanodes, and support for report generation.
- KNIME Big Data Extension – provides a set of nodes for accessing Hadoop/HDFS via Hive from within KNIME. Cloudera, Hortonworks and MapR distributions are all supported.
- KNIME Cluster Execution – supports the execution of KNIME jobs in a clustered environment.
KNIME Analytics Platform 3.0 has just been announced and major features of the early access release are the new User Interface and updates to all of the underlying libraries, including updates to Java 8, Eclipse 4 and BIRT 4.
The working environment of KNIME is primarily drag-and-drop, and highly visual. It caters for novice users who might simply want to visual data, through to data scientists who need to build complex workflows and sophisticated predictive models. KNIME is the premier open source analytics platform still being actively developed – and it looks like it will stay that way.