algorithms.io
algorithms.io provide a web hosted service to collect data, generate classification models and score new data. Code is added to web and portable device applications which stream data to the algorithms.io service, where it is captured and processed using random forest, support vector machine, K-Means, decision tree, logistic regression and neural network algorithms. The resulting model is then used to categorize new data. The results are passed back as a parsed data stream to power apps, or as reports and visualizations. A set of APIs are provided to developers to integrate machine learning into web and mobile applications. The algorithms are categorized as anomaly detection, clustering, classification and collaborative filtering.
The firm also offers consulting services which cover data modeling, algorithm development and custom applications. Use cases include wearable sensors, manufacturing and healthcare.
BigML
BigML offers a cloud based predictive analytics capability that is both refreshingly straightforward and extremely powerful. These two qualities are usually mutually exclusive, but by using decision trees and decision tree ensembles in conjunction with some very useful visualizations, the whole process of building and testing predictive models becomes so much easier.
The steps required to build, test and use a model are simple enough. Upload data to the BigML SaaS platform, build a model (which might be just a single operation), test it on test data and if all is well download it as Java, Python, PMML or any one of several formats. Then plug it in to your production systems.
The users of BigML will range from skilled business users through to data scientists and consultants. Obviously some level of knowledge and training is necessary, but a savvy business user should get the hang of things very quickly.
While BigML will produce a decision tree with considerable speed (typically in seconds or minutes), the real power is to be found in the decision tree ensembles where many trees are created and an ‘average’ created. A technique known as bagging is used where the data are randomly sampled multiple times (with replacement) and a tree created from each sample. It emulates having a much larger data set and nearly always produces much more accurate models.
The decision tree graphics are not only very visually appealing, but contains a great deal information and are interactive. A Sunburst visualization shows which classifications have most support and confidence in a highly graphic manner, allowing users to quickly home in on the the most useful classifications.
In my opinion the focus on decision tree ensembles is very appropriate. Various ensemble methods have won the vast majority of machine learning competitions in recent years, and have been called the most significant development in machine learning over the last decade. This is a good strategic decision by the founders of BigML.
The technology has found a broad range of applications including predictive marketing, fraud detection, recommendation systems, image analysis, pricing optimization and many others that satisfy very specific needs.
BigML went live to the public less than a year ago and obviously it plans the roll out of further capability and product. These include additional learning methods (k-means for example), non-linear decision trees and time series analysis. It will also be beefing up its cloud offerings to include virtual private clouds (VPC), and multi-cloud (for other cloud platforms – eg Azure).
BigML satisfies the requirement that ‘Things should be made as simple as possible, but not any simpler.’ very well, and is worthy of investigation by any organization (of any size) that needs to employ predictive tools.
dotplot
Dotplot provides data mining, statistics, text mining and predictive analytics tools in an integrated, highly graphical cloud based environment. All that is needed to use dotplot is a browser. Resulting models can be integrated with other applications via web services using SOAP and REST protocols. Dig a little deeper and dotplot is actually a much needed graphical front end to R and Weka functions. This accounts for the very large number of functions supported and the broad capability.
While dotplot provides an extremely easy to use interface it supports some very advanced functionality, and particularly the ability to pass parameters from one function to another (as well as data sets). Many parameters are hidden from novice users, but can be revealed for experts (some R functions have up to 100 parameters). For larger organizations wishing to deploy models within the enterprise dotplot provides an on-premise execution engine.
In the pipeline are model building wizards, visualization tools and apps. The wizards will help in the construction of common models and the visualization tools with data presentation. The apps are solutions created by dotplot or by dotplot users, and the active dotplot community is not only a support and learning environment, but a source of analytics solutions which will eventually be offered through an App Store.
Dotplot is positioned as a cheaper alternative to large products such as SPSS and SAS. For many users this will be the case, but dotplot does not compete with the large scale implementations of such products in large corporations. In fact this is not the target market, and dotplot has been created to bring practical analytics to both medium and large organizations.
The company was founded in 2012 and is based in Munich, Germany. The vision of the CEO is that dotplot should become a standard graphical interface to R, and that a large community of users will create many off-the-shelf analytics solutions in the same way that R users have created hundreds of analytics tools as an extension to R.
Pricing is modest and even includes a free Personal subscription with limitations on the functions available, data size and cpu usage. The Premium service comes in at a modest $59 per month and provides unlimited resources (except for storage at 100 GB) and support. For larger organizations enterprise deals are available with considerable discounts on the flat rate.
In summary dotplot will be one of the most cost effective statistical analysis and data mining solutions for a large population of users, and the free subscription provides a good opportunity to try it out.
Kontagent
Kontagent provides analytical tools specifically targeted at customer analysis. It provides three components:
- kSuite Mobile provides a variety of indicators for customer engagement, acquisition, engagement, conversion and retention. It also supports customer event tagging and analysis.
- kSuite Social facilitates greater web and viral presence and exploitation of social apps.
- kSuite Datamine provides a very fast, big data environment for data mining activities. Analysts use SQL syntax to construct queries used for tasks such as fraud and churn analysis.
Microstrategy Analytics Express
Microstrategy Analytics Express is a cloud based analytics solution for business users. It supports a wide variety of visualizations and allows reports to be delivered to any number of recipients. A wide variety of data sources are supported (Excel, CSV, MapReduce, columnar databases, Salesforce) and presentation modes include reports, graphs, charts and dashboards – all of which can be shared. It does not however support predictive analytics capability (Analytics Enterprise is required for this). Both web browser and iPad access are supported.
Analytics Express is free for a year, after which subscription applies.
Swift IQ
Swift IQ provides cloud based data consolidation and machine learning services. Swift Access allows data from a wide variety of sources to be imported into cloud based big data storage, with relevant security and access management. Data can be imported using CSV, XML or a REST API. Once imported into the big data storage it can be accessed by any users/ applications that have privileges to do so. Analytics are also available that provide information on available data and its usage.
Swift Predictions is a machine learning environment with a large collection of algorithms. These come with a REST API for integration into applications and use by third parties. Developer tools are also provided along with a variety of starter applications. A number of solutions for retail are also available.
Teradata Aster Cloud Edition
Teradata Aster Cloud Edition provides the Aster Discovery Platform in a could environment. This is a massively parallel solution with embedded MapReduce and a facility for multi-structured data sources and types. Applications can be embedded within the database engine for very fast analysis of large data sets, and is often called in-database analytics.
SQL-MapReduce is used, and this allows developers to write powerful SQL-MapReduce functions in languages such as Java, C#, C++ and R to create analytics processes which can execute within the database. This has found application in fraud detection, graph analysis, and social behavior analysis.
wise.io
wise.io provides an extremely fast implementation of the Random Forest machine learning technique that is suitable for classifying complex, high dimensional data orders of magnitude more quickly than most alternatives. The technology was originally developed to search data generated by astronomical observations. The power of the technology is well expressed by its ability to learn and categorize handwriting within just a few minutes, compared with a week of learning using the favorite technology for this problem, the support vector machine.
I spoke with Joshua Bloom, the CEO of the company and UC Berkeley Associate Professor Astrophysics. He was keen to emphasize that the learning algorithm is just a small part of the overall model production and deployment cycle, although having a resource that executes at this speed makes many otherwise difficult problems amenable to a solution.
The company offers a SaaS facility called Machine Intelligence Engine where users upload their data, build a model and then either download it, or have it execute in the SaaS environment. Fees are levied according to the level of usage, and users typically require a period of hand-holding, which may range from a few hours to a few days. WiseRF on the other hand is downloadable and allows models to be built in-house. It comes in three flavors – Pine, Oak and Sequoia with increasing scalability and capability. A 15 day trial can be downloaded.
Applications range from OTC trading through to industrial safety, and while the accuracy of Random Forest is widely appreciated, having these very high levels of performance means that ‘real-time’ problems can be addressed.
wise.io is supported by its customers and is cash flow positive (a rare state for a startup) and will undoubtedly make a nice acquisition should the firm wish to go that way.
Yottamine
The Yottamine Predictive Services software (using Amazon Web Services) delivers very high levels of analytics performance for near real-time predictive analytics. This makes the use of support vector machine (SVM) methods practical for many applications that would otherwise require impractically large amounts of compute power. Big data capability is provided through the Hadoop architecture, and is capable of handling terabyte size data training sets. A simple, flexible API is provided to facilitate a web service interface for applications – and this can be run locally or in the cloud.
The high performance characteristics of Yottamine are derived from it’s parallel programming capabilities so that large clusters of processors can be dedicated to a single job.