Microsoft Cortana Analytics Review Summary.
Microsoft Cortana Analytics is a bundling of several Azure services and various analytics platforms. It embraces big data, data warehousing, data management and preparation, machine learning, business intelligence and the emerging Internet of Things. Just a few years ago the natural candidate to offer this level of capability would have been IBM – but Microsoft is now a serious contender in this age of big data and real-time, event driven analytics. Despite the newness of much of this capability Microsoft is already attracting some very large customers, and components such as Azure Stream Analytics and Event Hubs mean businesses can build their IoT applications today.
Power BI has moved from being a fairly unattractive Excel based toolset to a desktop data visualization tool, with cloud sharing, in just a year. Power BI now challenges the premier data visualization tools such as Qlik and Tableau, and will undoubtedly take market share. Azure Machine Learning provides APIs for developers, and tools to create models. Microsoft has also embraced R, both within SQL Server and through its acquisition of Revolution Analytics.
The speed, quality and decisiveness of Microsoft in the big data and analytics space is impressive, and the modest pricing of many services and tools will be attractive to organizations of all sizes. The one gap in all of this is optimization. Microsoft Excel does support a solver, but it isn’t up to the large scale optimization tasks that a large business needs to perform. So maybe there will be another acquisition soon.
Data
There is a great deal happening in Microsoft’s data management products and solutions. Several cloud offerings are available, addressing big data and data warehousing. SQL Server is now available on Linux and has been enhanced to support R. And a series of cloud products have been made available to help manage, transform, route and prepare data for analytical and production uses. The Internet of Things is also well anticipated by Microsoft through its Azure Stream Analytics platform and managed services such as Event Hubs. The main competitors of course are IBM and Amazon, but Microsoft seems to have a momentum that will keep it ahead in the game.
Event Hubs
Azure Event Hubs is a cloud based facility supporting the rapid ingestion of data from devices, web sites, applications, and any other data source which generates very high volumes of data. It plays a role similar to Apache Kafka, but is a hosted solution rather than a software component. Event Hubs is already used to ingest in excess of a million events per second and preserves event order on a per-device basis. By default it retains messages for a day, but at extra cost these can be retained for up to seven days. Pricing is modest too, and is based on the number of events processed – a million events costs lest than thirty cents. Support for Advanced Message Queuing Protocol (AMQP) and HTTP allow many platforms to work with Event Hubs. Native client libraries also exist for popular platforms.
Data Catalog
The Azure Data Catalog addresses the issue of data visibility. Most businesses simply do not have a documented record of their data assets – the data that is hidden away often being referred to as ‘dark data’. The catalog allows those with responsibility for data to register data sources with controls over who can see what. This is a fully managed service and will be particularly useful to data scientists, business analysts, and developers.
Data Factory
The Azure Data Factory is essentially a mechanism for pulling raw data in from almost any data source, refining it, and then distributing it to the applications that need it. It allows data to be shaped and transformed as needed, and then published to relevant applications. For example social data can be ingested, transformed and then delivered to appropriate applications – a sentiment analysis app or other systems that need to consume social data that has been formatted for particular needs. The term ‘factory’ is quite appropriate, since Data Factory supports the scheduling, orchestration and management of processes that make data fit for purpose. Various tools provide a visual representation of the various data pipelines, including the ability to create alerts for exceptions.
HDInsight
HDInsight is a managed big data service. It embraces Hadoop, Spark, HBase and Storm and obviously avoids the need to set up an in-house big data infrastructure. Support is provided for a variety of languages, including Java, C# and .NET. It is also integrated with Excel for analysis. Base, the columnar database, is also included providing the ability to handle transaction processing. Both Linux and Windows clusters are supported.
Data Lake
The Azure Data Lake acts as a dumping ground for high volumes of complex data that utilizes the HDInsight infrastructure and adds analytics services. U-SQL is included – a new version of SQL that includes greater coding support, and comes with a scalable distributed runtime. YARN is central to job execution. This is primarily an analytics platform for data scientists, analysts and others wishing to create visualizations and other forms of analysis.
SQL Data Warehouse
The Azure SQL Data Warehouse scales to petabyte size and provides support for Transact-SQL and seamless interfaces to Power BI, Azure Machine Learning, HDInsight and Data Factory. At the time of writing this services is still in preview.
Analytics
In just over a year Microsoft has moved from having almost no analytics tools to providing market leading tools and platforms. Until the introduction of Power BI early 2015, analysis was the domain of Excel – not particularly user friendly. Power BI now competes with the like of Tableau for ease of use and power and of course connect in to the considerable analytics infrastructure now available from Microsoft. A few months after the introduction of Power BI Microsoft announced the Azure Machine Learning platform for the creation of predictive models and other applications. And to maintain the pace SQL Server now supports R, and the R platform acquired through Revolution Analytics is available as a Microsoft product. This has placed Microsoft firmly up with the leaders of business analytics technologies – and we should expect more.
Power BI
Power BI comes as both a desktop application and a cloud service, and i the primary data visualization product offered by Microsoft. The desktop version connects to a wide variety of data sources, and has a seamless interface to the Azure SQL Data Warehouse. Graphs, charts and dashboards can be uploaded to the cloud service for sharing, and the cloud service also supports chart and dashboard creation. A full review of Power BI can be found here.
Azure Machine Learning
Introduced in 2015 Azure Machine Learning provides development tools and machine learning services. Azure ML Studio provides the development toolkit using a drag-and-drop visual interface, and the capabilities can be extended with R and Python.
Solutions
The Analytics Gallery provides solutions to common machine learning problems, and allows developers to place their solutions in the Azure market, in add-on to the standard solutions provide by Microsoft (e.g. text analytics, sentiment analysis and face recognition). A variety of APIs are also available to access standard solutions – recommendations, customer churn, speech recognition etc.