Can Dell step in the same river twice? We all know that it very successfully commoditised computing hardware, and particularly the PC market. It now has another similar aim – to commoditise computing infrastructure and particularly business analytics. To this end it has acquired an impressive array of technologies which support data and application integration, machine learning and predictive analytics, business intelligence and database management. It is also setting up partnerships with Algorithmia for access to machine learning algorithms, and Microsoft Azure ML (a machine learning platform with developer toolkits).
Dell believes that machine learning and advanced analytics, currently an esoteric domain, will become commoditised so that business users gain direct access to the predictive algorithms and models they need to use. This is a view I concur with, since it is immutable as the law of gravity. Esoteric technologies always make their way into the mainstream, sooner or later, in the same way we don’t need to understand quantum mechanics to use a laptop (all semiconductor devices are designed using quantum mechanics).
The recent acquisitions by Dell (and some of these are multi-billion dollar in value) form an impressive suite of technologies to support businesses as they seek to embrace analytics technologies. To this end Dell has addressed the complete picture – infrastructure, data, business intelligence and machine learning. Here is a run down of its recent acquisitions:
In 2010 Dell acquired Boomi – the integration platform as a service (iPaaS). In everyday English this means Boomi provides a cloud based service to integrate diverse data and applications, manage master data, and manage the proliferating number of application programming interfaces (APIs) that applications and services provide to so they can communicate with each other. Clearly this is part of Dell’s master plan to enable the commoditisation of not just analytics, but the whole computing infrastructure. Boomi supports literally dozens of applications and data sources.
In 2012 Dell acquired Quest, and specifically the Toad suite of products which provides data and database management, automates many reporting tasks and supports data validation. As a supplier of server technologies, many of which are used for database applications, this obviously makes sense for Dell. There are many utilities within Toad, including Data Modeler, and solutions for specific databases.
In 2013 Dell acquired Kitenga, a big data analytics platform and now embedded within Statistica.
In 2014 Dell acquired Statsoft, a major player in statistical analysis and machine learning. Statistica is a very sophisticated product capable of addressing enterprise needs with an integrated platform that addresses data mining, text analytics, statistical analysis and business rules management. And it is clear that Statistica is a hub around which a great deal will revolve. It can consume models built in other languages and provide the model management capabilities that are missing in many other platforms. It will also export models in a variety of languages so they can be used in production applications. The relationship with Algorithmia and Azure ML will enhance this facility to consume models and use them in a well managed environment.
Since this article was initially prompted by the obvious question – what on earth is Dell doing buying Statistica? – here is a brief review of Statistica.
Dell STATISTICA embraces most of the analytics tools many organizations will need, both large and small. One of the most powerful aspects of the product set is the level of integration, with seamless connections between disparate modes. Statistics, machine learning, data mining and text mining are all at the disposal of the user without having to migrate from one environment to another. It also features a graphical interface where workflows can be constructed to process data, and these can of course be saved and used by anyone who has permissions.
There would be little point listing of the statistics features and functions since they encapsulate the functionality most users will need. The data mining tools include C&RT, CHAID, I-Tree (interactive decision trees which can be manually modified), boosted trees, random forest, neural networks, clustering, association mining and link analysis. Bayesian classifiers, support vector machine and nearest neighbor K-nn are categorized as machine learning. The text mining facility features a web crawler to download data from the Internet, and also supports stemming, stop words, inclusion words, synonyms (typically used in sentiment analysis) and word morphology functions (minimum word length, permissible characters etc). Bag-of-words analysis will generate a variety of statistics which can be used as features in data mining activities.
For larger organizations STATISTICA Enterprise provides an enterprise working environment for business users as well as analysts. Users are typically presented with analysis, reports and dashboards they can use, and various forms of monitoring and alerts are also available.
The deployment of models created in STATISTICA is very flexible and includes PMML, C, C++, C#, Java, SAS, SQL stored procedures and Teradata – and of course deployment into STATISTICA Enterprise.