DatumBox provides a cloud based machine learning platform with 14 separate areas of functionality, much of which is relevant to text analytics. The various functions are called via a REST API and address the following types of application:
- Sentiment Analysis – classifies documents as positive, negative or neutral.
- Twitter Sentiment Analysis – specifically targeted at Twitter data.
- Subjectivity Analysis – classifies documents as subjective (personal opinions) or objective.
- Topic Classification – documents assigned to 12 thematic categories.
- Spam Detection – documents labeled as spam or nospam.
- Adult Content Detection.
- Readability Assessment – based on terms and idioms.
- Langauge Detection.
- Commercial Detection – commercial or non-commercial based on keywords and expressions.
- Educational Detection – based on context.
- Gender Detection – written by or targeting men/women based on words and idioms.
- Keyword Extraction.
- Text Extraction – extraction of important information from a web page.
- Document Similarity – to detect web page duplicates and plagiarism.