For IBM predictive analytics is largely a data management and infrastructure issue. In my conversations with them they stress the data management aspect particularly, and with good reason. The application of algorithms to data and the building of models, which is primarily accomplished with SPSS, is really just a small part of the story. The management of large data volumes and the deployment of models into the production environment is the more challenging aspect of analytics, and it is something IBM does very well.
The IBM analytics solution will primarily be of interest to large organisations looking for more than a point solution, and wanting to create a viable, long term analytics infrastructure and capability. To this end IBM offers its InfoSphere data management and infrastructure products, and the SPSS suite of analytical tools for both analysts and end users. The combination represents the premier analytical solution currently available, and of course IBM has a number of vertical solutions to offer also. It is of course a fairly expensive solution, but in many ways is unchallenged.
Data Collection Family
This suite of products from IBM is primarily aimed at the design, creation, deployment, analysis and reporting of surveys. They provide a top-to-tail capability that supports various means of survey distribution (web, paper, phone, in-person) and the supporting technology to capture the results, including scanning of documents and text processing.
The SamplePower utility provides a means of establishing survey sample size – something that would normally require a skilled statistician. This sets the tone for the whole Data Collection product set, since virtually all elements of the process can be handled by users. This does not however include the analytics used to draw conclusions from the data, and is the domain of the statistics and Modeler packages.
IBM SPSS Statistics
This perhaps the most widely used set of commercial statistical products in the world. The capability ranges from end user marketing tools through to specialised statistical analysis, and of course the very well respected SPSS analyst workbench. There isn’t much utility in detailing the features of the statistics capability because it does pretty well everything. A few things are also available that are not really statistical in nature such as neural networks.
IBM SPSS Modeler
This employs data mining techniques to find relationships within data. The professional version supports the creation of predictive models using classification, association and segmentation techniques. Modeler Premium adds the ability to process unstructured data from the web, text, email, social data and so on. Again there is little point listing all the techniques supported by Modeler since most conceivable options are present (Bayes, SVM, K-means etc).
IBM SPSS Decision Management allows predictive models to be integrated with business rules for deployment into production systems. The Collaboration and Deployment option supports the sharing of analytical assets and provides an environment to automate the analytical process.
InfoSphere addresses more than predictive analytics requirements and is fully addressed in a separate paper. However the broad capability of the product suite includes InfoSphere Warehouse for traditional data warehousing, InfoSphere Information Server, DataStage and Data Replication to support integration and data staging, Master Data Management and Big Data analytics, which is based on the Apache Hadoop technology.
Big Data analytics not only supports large data sets, but provides sufficient performance for real-time analytics and accommodation of very high volume streaming data. This will become more important as information sources from various sensors (eg RFID) and real-time market information becomes more widely used.