These data quality tools are fairly diverse in nature, with some offering a level of ETL functionality and others focusing on functions such as address correction.
Talend’s open source data quality tools are embedded in Talend Open Studio for Data Quality, a popular open source data quality application. Main features include:
- Free to download and use under an Apache license.
- Very easy to learn, with an Eclipse-based graphical workspace geared toward drag ’n drop functionality.
- Versatile enough to work in any IT architecture, with more than 400 built-in connectors that enable easy access to major databases, file formats, and package enterprise applications.
- Comprehensive data quality improvement functionality, including support for data standardisation, de-duplication, and enrichment.
- Support for the design and deployment of reusable enterprise data quality services, including real-time data cleansing services to keep up the quality of incoming data.
- A web-based data quality monitoring and reporting portal to help spread data quality awareness and a data quality culture across your enterprise.
The data quality tools in Talend Open Studio for Data Quality allow users, without having to write any code, to perform data quality analysis tasks ranging from simple statistical profiling, to analysis of text fields and numeric fields, to validation against standard patterns (email address syntax, credit card number formats) or custom patterns of a user’s own creation.
Blazent’s 5-step data evolution process begins with data atomization, which breaks down IT data, regardless of its source, to its most granular level. It then enriches the data with identity management, relationship analysis, purification, and historicity. To create the master source of truth, Blazent integrates with more than 230 discrete data sources, from ITSM systems like ServiceNow to procurement, billing, operational tool stacks, or even shadow IT sources like spreadsheets. Powered by high-performance technologies including Active MQ, Cassandra, Hadoop and Spark, Blazent’s big data engine is optimized for scalability and near real-time data processing.
AB Initio provides significant data quality tools as part of a broader suite of products for building, running and integrating enterprise applications. The end-to-end approach to data quality is based on design patterns using Ab Initios coupled technologies. They are all architected together including the Co>Operating System, the Enterprise Meta>Environment (EME), the Business Rules Environment (BRE), and the Data Profiler. Using Ab Initio, a company can implement a complete data quality program including detection, remediation, reporting, and alerting.
Data Ladder’s data quality tools offer very high levels of matching speed and accuracy for the business user at an affordable price. The company recently beat IBM and SAS in matching accuracy and speed for enterprise level data cleansing in an independent study. The company’s flagship software suite DataMatch includes the following features:
- Clean, deduplicate, and match data with advanced technology previously available only in high-end customized software solutions
- Quickly combine customer, vendor, and sales lead information
- Big data capability on data sets up to 100 million records
- Advanced record linking technology provides ability to create data warehouses
- Quick data profiling tool
- Scalable configurations for deduplication and record linking, suppression, enhancement, extraction, and standardization of business and customer data
Link and consolidate customer data quickly and easily. Data Ladder offers a free trial for users.
Data Manager is a program which allows you to process and manipulate your data in a easy and logical manner using a graphical interface. It reads and writes delimited files such as comma separated files (CSV) and also can read data from ODBC Data Sources. It also allows you to construct a conceptual design on how you are going to process your data and transform it into another form. You form your design by adding functional nodes and linking them such that the links form the data flow through nodes on a graphical work area.
Each node performs a single function on your data, once it completes it passes your data to the node it is linked to and the process continues until the data encounters a output node. You can form a simple design or a complicated design with hundreds of nodes and multiple input and output nodes.
Datamartist is a fast, easy to use, visual data profiling and transformation tool. It includes a data profiling tool for analyzing format, types, completeness and value counts. Understand data quality issues clearly and quickly. Data can be transformed in a graphical ETL environment with a library of different data blocks. Export out to files or directly to databases.
DataPreparator is a free software tool designed to assist with common tasks of data preparation(or data preprocessing) in data analysis anddata mining. DataPreparator provides:
- A variety of techniques for data cleaning, transformation, andexploration
- Chaining of preprocessing operators into a flow graph (operator tree)
- Handling of large volumes of data (since data sets are not stored in the computer memory)
- Stand alone tool independent of any other tools
- User friendly graphical user interface
DQGlobal provides a suite of data quality software including deduplication, data migration, an API with a set of data quality improvement functions, and specific utilities (e.g. formatting addresses).
iManageData is an essential tool for commercial data pre-processing. iManageData helps you create cleaner, more useful information from your data. With its comprehensive selection of data sources, filtering, data conversions and mathematical transformations, iManageData provides quality data for any analytical application.
OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase.
Paxata is an enterprise platform providing the tools to significantly speed up data cleansing, and offers a contemporary solution that employs a big data infrastructure and automated techniques which exploit machine learning methods. The net result is a self-service data preparation platform that can be used by business analysts and skilled business users to considerably speed up the data preparation task.
The core capability of Paxata leverages Hadoop and specifically Spark, so that large scale in-memory processing is available for the machine learning algorithms that give Paxata much of its power. Paxata can be deployed on premises or accessed as a cloud service. The on-premises deployment requires a Hadoop environment (either dedicated or shared).
[vc_button title=”Paxata Review” target=”_self” color=”default” href=”http://butleranalytics.com/paxata-review/”]
Syncsort offers a variety of data processing tools including fast data sort, ETL, ETL Optimization, SQL migration – and several others.
DQ Cloud Services from Uniserv is unusual in that it is a cloud based service. Connectors exist for a wide range of business applications, including Microsoft Dynamics CRM, Oracle Siebel, Salesforce and SAP Business Suite. It is able to enhance data in applications (telephone numbers, location etc) by accessing public databases. These functions cover bank data checks, email validation, entity titles (businesses or individuals), and address correction.
Uniserve Data Quality Service Hub augments and corrects data in many business applications. Correction of address information, email address checking, telephone number checking, and bank data verification are all included. The platform can operate in batch and/or real-time. Data Analyzer establishes the current state of data quality, Data Cleansing corrects data, Data Protection ensures updates are adequate and from authorised people, and Data Governance detects unusual activity.
Tamr provides a sophisticated, but easy to use, data mapping, cleansing, integration and unified data access platform. Underneath the hood there is some very smart technology. Support for big data and semi-structured data types will be welcomed by many users employing those data sources and types.
[vc_button title=”Tamr Review” target=”_self” color=”default” href=”http://butleranalytics.com/tamr-review/”]
TS Quality is the data cleansing and standardization component of the Trillium Software System, a robust, scalable, highly available and easily deployable solution for mission-critical enterprise data quality. Trilliums data quality services deploy in batch or real-time through an on-site or hosted solution, using the same rule sets and standards across an unlimited number of applications and systems.
WinPure’s award-winning Clean & Match data quality software provides advanced data matching and sophisticated data cleansing at an affordable price for any business size. Clean & Match has been specially designed to be used by anyone, not just IT professionals. Combining a simple-to-use interface and powerful features its ideal for cleaning, correcting and deduplicating mailing lists, databases, spreadsheets and CRM’s. Main features include:
- Fast fuzzy matching technology to identify duplicate records, previously only available in enterprise software.
- Real-time statistics with 3D charts and scoring system to help populate missing data.
- Complete suite of data quality tools including standardizing, profiling, filtering and de-duplication.
- Works with business and consumer data, local and international.
- Scalable editions for any business size.
- World-class customer support.
WinPure offer a free 21 day trial so you can try the software using your own data.