The recent developments in database architectures and storage methods have not necessarily made life any easier. We can store data in various encoded forms, as arrays, as hierarchies, in large quantities, and so on. The core problem of bringing data together for analysis has not necessarily been helped, despite the current fascination for data lakes and other mechanisms for merging data sources. The problem with all schema-on-store mechanisms is that the scope of queries that can be made against the schema
RAW Labs uses some very interesting technology and techniques to provide a workable schema-on-read solution to the needs of analysis. The company could quite legitimately call its technology AI, although the solid academic origins of the platform mean it is presented in a rather soberer manner. The claim is that RAW infers a data schema based on the query that is launched against diverse, and multiple data sources. It does this using some very advanced and novel mathematical techniques that come from Category Theory. The net result is that RAW learns how data is being used, and through smart caching optimizes access to data. Obviously
The platform, which can be on-premises or hosted in the cloud, is already being put to good use by a number of large organizations. It supports just-in-time analytics – the ability to query data as the business demands, rather than how the data structures will allow. RAW transparently accesses most data stores including CSV, noSQL, XML/JSON, RDBMS and log files.
The platform also supports complex queries such as arbitrarily nested queries as well as the ability to join the data from the variety of underlying source files in a single query. E.g. joining machine logs with asset information from excel files and maintenance history from a relational data base.
Use cases include the conversion of unstructured Word documents into structured data, the discovery of unusual items of data in very large data volumes, consolidation of disparate data sources, and many others. In fact, once a business has access to a platform that can handle high volumes of data from diverse sources the applications become numerous.
This is a very interesting technology, and not least because it employs techniques that are wholly new in this domain. The problems associated with the diversity and volume of data are common to all analysis platforms. RAW could certainly be positioned as a universal back-end for analytical tools. It would also make a very good acquisition target for one of the large analysis platform vendors.