Decision tree classifiers are widely used because of the visual and transparent nature of the decision tree format. They can suffer badly from overfitting, particularly when a large number of attributes are used with a limited data set. The list of free decision tree classification software below includes full data mining platforms such as KNIME, RapidMiner and Orange, and some stand-alone libraries. BigML is different in that it is a cloud based service with a fairly generous free subscription.
BigML is a commercial platform with a generous free subscription. It is a cloud based service with an attractive interface and support for decision tree classifier creation and random forest. Models can be accessed via an API.
KNIME is a visual data mining platform in open source and commercial versions. It caters for decision tree classification of nominal attributes, while other attributes may be nominal or numerical. The algorithm provides two quality measures for split calculation; the gini index and the gain ratio. Further, there exist a post pruning method to reduce the tree size and increase prediction accuracy. The pruning method is based on the minimum description length principle.
Orange includes multiple implementations of classification tree learners: a very flexible TreeLearner, a fast SimpleTreeLearner, and a C45Learner, which uses the C4.5 tree induction algorithm.
PC4.5 uses C4.5 to build decision trees on parallel hardware architectures. PC4.5 runs on SunOS, Solaris and Linux machines. In an N trial c4.5 run, a single process builds N classification trees one by one and then picks the best one. In PC4.5, the N trials are each handled by a process and each process is run on a different machine (if N or more machines are available).
RDataMining is a R package that performs most data mining functions, including decision tree classification.
RapidMiner comes in both open source and commercial versions. The support for decision trees includes classification for both nominal and numeric data, and splitting is accomplished using information gain, mini index, or gain ratio algorithms. Several parameters control the creation of the tree, including pruning parameters. A CHAID (Chi-square Automatic Interaction Detector) operator is also included that uses a chi-squared based splitting criterion.
Scikit-Learn is a machine learning library for Python. It includes a large number of methods and algorithms, and decision tree classification is well catered for. The supported algorithms include ID3, C4.5, C5.0 and CART.
Smiles is a machine learning system that integrates many different features from other machine learning techniques and paradigms and, more importantly, it presents several innovations in almost all of these features. In particular, SMILES extends classical decision tree learners in many ways (new splitting criteria, non-greedy search, new partitions, extraction of several and different solutions), it has an anytime handling of resources, and has a sophisticated and quite effective handling of (misclassification and test) costs.
WEKA is often incorporated into other data mining and analytics platforms – KNIME and RapidMiner for example. It contains a large number of decision tree classifiers – about a dozen in all.
YaDT is a new from-scratch implementation of the entropy-based tree construction algorithm. It has been designed and implemented in C++ with strong emphasis on efficiency (time and space) and portability (Windows/Linux, 32/64 bit executable).