Magnum Opus is a highly pragmatic solution to many association rule mining needs and provides a useful compromise between ‘black-box’ solutions which filter rules by unknown mechanisms, and those which simply churn out thousands of association rules – most of which will be meaningless.
There are a number of fairly unique features associated with Magnum Opus including speed (linear relationship with data size), the ease of use, the filtering mechanisms and the simple representation of rules. It will address the needs of many organizations without all the fuss of a large data mining suite. A demo version can be downloaded that will process datasets up to 1000 cases, or a 10 day free trial of the full unthrottled product is available.
The central technical philosophy behind Magnum Opus is the use of k-optimal (top-k) association discovery techniques. These allow the user to specify desirable associations using standard techniques such as lift, leverage, strength, support and coverage.
Data
Text files are accepted for input but there is no support for direct connection with database systems. A distinction is made between transaction data where, for example, each transaction is simply a collection of items purchased, and attribute-value data typical of data stored in rows in a database (customer details for example). A data import wizard addresses the mechanics of getting data into the program.
Filtering
Five filter options are provided to filter out rules and itemsets that are unlikely to be of interest. These include Filter-Out None, Filter-Out Redundant, Filter-Out Unproductive, Filter-Out Insignificant and Filter-Out Unsound. The details of these modes are too involved to explain here, but they provide the mechanisms to produce meaningful rules and itemsets that will be of interest to the user.
Search Settings
A substantial number of search settings are available. All of them will be useful, providing mechanisms to specify minimum lift values, minimum values for support, coverage and strength, and allowed values for RHS and LHS of rules. Criterion can also be specified for best associations.
Statistical Tests
The probability of finding associations in data that are nothing more than random accidents is reduced by using two filters. The unsound filter does not use the classical 0.05 or 5% probability of a pattern being true, this just would not make sense when billions of patterns are processed. Instead it uses the Bonferroni correction which takes account of the number of combinations being considered, and in most practical association discovery processes becomes a small number typically less than ten to the minus twenty power. In reality this can be too conservative and so the data can be split into two sets – exploratory and holdout. Associations are discovered using the exploratory set and then subjected to statistical tests in the holdout data. Which test is used depends on data size as much as anything, and the holdout test may be preferable for smaller data sets.
About GI Webb Associates
Geoff Webb is research professor at Monash University and Magnum Opus is a good distillation of advanced concepts with ease of use.