Tamr Review Summary
Tamr addresses the seemingly straight forward task of cataloging data sources, identifying attributes, metadata integration over multiple sources, data cleansing, and publishing a cohesive view of data to applications, services and tools that need it. Of course, as anyone who concerns themselves with such issues will know, it isn’t straight forward at all. Even tasks as simple as deduplication are beset with difficulties. To make the whole task somewhat more feasible Tamr employs machine learning technologies to carry out much of the grunt work. The algorithms work alongside relevant domain experts so that ambiguities and other issues can be resolved.
At the simplest level Tamr allows users to register data sources in a centralized catalog, facilitates the creation of a unified schema, cleanses data, and publishes via a RESTful interface a single version of the truth.
Platform
Tamr supports a centralized inventory of enterprise data. It automatically catalogs all metadata available to the enterprise in a central, platform-neutral place. This enables data to be grouped by logical entities (customers, partners, employees) rather than where it’s stored, making it easier for companies to discover and uncover the data necessary to answer critical business questions.
It allows easy data connections across siloed people, processes and places. Advanced algorithms automatically connect the vast majority of data sources while resolving duplications, errors and inconsistencies among attributes and records. When the system can’t resolve connections automatically, it calls for human expert guidance, using people in the organization familiar with the data to weigh in on the mapping and improve its quality and integrity. Tamr automatically matches attributes across a full range of data sources, often accomplishing up to 90% of the task without human intervention
Data
Tamr supports structured and unstructured data. This includes CSV and XLSX files, relational databases (via JDBC), semi-structured data, such as JSON, XML and YAML, RDF and full text (via preprocessing into RDF or semi-structured data). Content management systems such as Documentum, Sharepoint and Alfresco are supported, and of course big data stores in the form of HDFS / Hive, Amazon S3/RedShift and Google Cloud Storage/BigQuery and others.
Tamr presents data, along with its metadata and the results of actions taken in Tamr, in a data inventory. Having the data in one place, along with analysis of the data semantics and connection to other data, makes it easy for users to explore and utilize data that they might otherwise have missed. This data inventory is also integrated with a directory of experts, making it easy to find people within the organization who are able to answer questions about the data.
RESTful APIs and a variety of export formats makes it easy to tie Tamr into existing infrastructure, allowing data scientists and business analysts to
use familiar software, such as QlikTech, Tableau, SAS, IBM Cognos, Recorded Future, Statwing and Zoomdata.
Solutions
A number of solutions are offered including procurement analytics, clinical trials (CDISC) and customer data integration.