Treasure Data – Live Data Management


Treasure Data primarily addresses data connection, ingestion and integration issues via a cloud based service. It also comes with some add on tools for analysis and the creation of data processing workflows. With the number and variety of data sources most businesses access only set to increase, the problem of data ingestion and integration is only going to become more acute. Treasure Data addresses these issues admirably and is fully kitted up to deal with real-time streaming data, as well as traditional databases, web applications and most other data sources.

The positioning of Treasure Data has always been difficult, simply because few other vendors address this space. However it has recently positioned its platform as addressing Live Data Management – a strange label, but nonetheless a meaningful one. With the emergence of IoT live data sources, so Treasure Data will increasingly be seen as a solution to integrating these and other data sources for real-time analytics, amongst other things.

More recently the company has raised $25m series C funding, allowing it to ramp up sales and marketing efforts and expand the product portfolio – mainly in the form of workflow support and additional analytics and data visualization connectivity.

treasure data


Treasure Data comes with its own columnar database called the Treasure Data Plazma database . This can be queried using Treasure Data’s own SQL tools, or can be exported to other databases or BI tools. Treasure BI (Business Intelligence) is a lightweight cloud BI option on Treasure Data, for small to medium size customers. Users can create reports and dashboards and deliver them to a team. Treasure BI is powered by Slemma, with an optimized and tight integration with Treasure Data. This feature is available as an Add-On.

An agent based on the Fluentd log collector streams data to Treasure Data. This is available as an open source product and as Fluent Enterprise Edition. The company claims support for over 300 data sources, with most data processing tasks executable directly on the console. Treasure Data allows users to issue jobs to process the data. When users issue the jobs, they can specify which data processing engine to use. Currently, there are three different data processing engines:

  1. Heavy Lifting SQL (Hive)
  2. Scripted Processing (Pig)
  3. Interactive SQL (Presto)

It has a scheduler feature called Scheduled Jobs that supports periodic query execution.

Treasure Workflow allow users to build repeatable data processing pipelines that consist of Treasure Data jobs. It is the hosted & fully managed version of the related open source project Digdag.

Treasure Machine Learning is based on Hivemall, a scalable machine learning library that runs on Apache Hive. Hivemall is designed to be scalable to the number of training instances as well as the number of training features.


It is difficult to compare Treasure Data with other products, simply because it doesn’t fit neatly into any boxes. However, seen as a cloud database with ETL capabilities, it does compare with platforms such as Fivetran, Alooma and Stitch.