Revolution Analytics Review Summary
Microsoft’s acquisition of Revolution Analytics was not just a good move – it was a brilliant move. Revolution Analytics supplies an enterprise grade version of the open source R analytics language, with associated runtime improvements, support for big data, developer tools and various support, consulting and training services. I placed Revolution Analytics (RA) at the top of my assessment of analytics platforms back in 2014, and my opinion has not changed. It brings together the most complete and widely used analytics platform on the planet with extensions for, and adaptation to, enterprise needs. This move by Microsoft can be paralleled with Oracle muscling in on Java territory when it acquired Sun Microsystems way back in 2010. That Microsoft is serious about R is well demonstrated by its newly announced support for R in SQL Server 2016.
The first thing to understand about RA is that it offer two products. Revolution Analytics R Open (RRO) is free to download, and is essentially a more performant version of the open source distribution. It also provides mechanisms for sharing code. Revolution Analytics R Enterprise (RRE) provides a scalable, high performance platform for enterprise needs, and also provides security enhancements and support for big data platforms.
It should be obvious that this is not a platform for end users, and requires high levels of skill to use. Businesses wishing to build statistical and predictive models will find that RRE is unparalleled in its scope, supporting the thousands of additional packages, often developed by experts in their field.
The success of this move by Microsoft depends almost entirely on a trained and experienced pool of analysts and data scientists with the skills to build effective solutions. That R is the most widely used platform of its kind means there is indeed a large pool of experience and skills that can be drawn upon, and RA has always been a very active participant in the R community. There will be dissenters, people who object to the commercialization of R. But in reality many organizations will not use a technology without having a supplier who takes responsibility for it, and as such RA, and by implication Microsoft, will be seen as the bridge that makes R an enterprise proposition.
Revolution R Open
The market seeding mechanism used by RA is Revolution R Open (RRO). This can be freely downloaded and used in a commercial setting with no cost whatsoever. It’s a good way of testing the water without a commercial commitment. There are a number of advantages over the open source distribution of R. The first concerns performance and specifically the use of multi-threaded math libraries. The default R distribution is fine for modest analytical needs, but can grind to a halt for more demanding requirements. This feature alleviates this issue to some extent, although the enterprise edition will be needed for full scale enterprise deployments. It also comes with the Reproducible R Toolkit for code sharing and the facilitation of result replication. This includes the checkpoint package, which provides a reproducible way of using specific R package versions. It also needs to be stressed that RRO is fully compatible with all the packages that work with the open source distribution. Also, a high-performance default CRAN repository provides a consistent and static set of packages to all RRO users. As with the open source distribution RRO runs on Windows, Mac OS X and Linux.
Revolution R Enterprise
Revolution R Enterprise (RRE) is pretty much a different ball game to RRO and the standard R distribution. At the core it is standard R, but that is where the similarity ends. The issues RRE addresses include scalability, support for big data, deployment, performance, security and application integration.
Much effort has been invested in parallel execution, and ScaleR provides algorithms optimized for parallel execution on Big Data. These are optimized for parallel distributed execution, and effectively solve the memory limitation problem and scaling issues.
Two other components are core to RRE. DistributedR is an adaptable parallel execution framework providing services including communications, storage integration and memory management to enable ScaleR algorithms to analyze large data sets, and scale from single-processor workstations to clustered systems with hundreds of servers. ConnectR provides a library of connectors for access to a wide range of data sources, including ‘legacy’ SAS and SPSS data, as well as most relational databases and Hadoop HDFS and Hadoop Hive. Other components of RRE include RevoScaleR, DevelopR and DeployR – the latter of which is an IDE for R on Windows platforms. Finally RA offers AdviseR, a set of packaged, or customized support, consulting and training services.