Provalis Research announced the integration of WordStat, the leading text analysis software, with Stata, the popular data analysis and statistical software.
This new collaboration couples the cutting-edge numerical analysis of Stata with the unique text analytics functionality of Provalis Research. The combined technologies will enable business analysts and researchers to perform thorough statistical analysis and process unstructured data in a much faster and more accurate manner.
The business and academic communities have long needed statistical tools which enable both analysis of structured and unstructured data, but the combination has so far been uncommon. Many disciplines performing mixed methods research, such as economics, sociology, psychology and political science, have been facing the same challenge.
There is a need to analyze text data to identify topics while simultaneously determining similarities and relations to other data components. Some typical text mining tasks include text categorization, comparative analysis, topic modeling, entity relation analysis, and automatic document classification. In many ways, text mining addresses this challenge by enabling sophisticated indexing that “turns text into numbers”, which can then be incorporated into other analyses.
In order to better serve their clients, Provalis Research and Stata have collaborated to build an integrated solution that leverages both Stata’s statistical power and WordStat’s text analysis tools. With Provalis Research’s technology, Stata’s clients are now able to import documents from various file formats and automatically extract numerical, categorical or date variables from structured documents for effective text analytics.
“It means that someone can import news transcripts, reports, and extract not only the text, but also dates, numbers, etc.,” explains Normand Péladeau, Provalis Research’s CEO. Once documents are processed, WordStat’s exploratory text analysis function provides researchers with powerful analytics to extract themes and automatically identify patterns, including topic modeling which allows users to get a quick overview of the most salient topics from large text collections.