Ensemble learning consists of creating multiple models and combining them to (hopefully) create a composite model that outperforms any individual component. This approach has been called by various commentators the most significant development in machine learning over the last decade.
Building predictive models has traditionally involved the training of many models from the entire training data set, and via various validation techniques (cross validation typically), picking the best performing candidate. In reality however a single model may well be unstable (a slightly different training set creating a significantly different model) and does not capture all the useful patterns that might be found in the data.
Ensemble learning actually exploits some of the weaknesses of a single model approach, and model instability can be used to good effect. There are other surprises too, such as the inclusion of random modifications to the learning algorithm, which result in models that are surprisingly accurate.
A number of techniques have evolved that support ensemble learning, the best known being bagging and boosting. Bagging (Bootstrap Aggregating) is characterised by assigning a vote to each model (in classification problems) and choosing the class with the highest votes. Regression problems can be handled by taking an average of model values. Each model is built using a random sample of instances from the training set, so that each model gets a different set of instances to train on. Curiously enough this works best with unstable modelling techniques such as decision trees, whereas stable methods such as nearest neighbour typically do not benefit from bagging techniques.
Boosting is similar to bagging in that it uses a voting mechanism, although weighting is used so that the best models carry more influence. However the training sets for each model are derived in a wholly different manner. As models are built, emphasis is given to those instances in the training set that are incorrectly classified, and these are given priority in the training sets of new models. Gradient boosting is a particular variation of this approach used in regression problems. Finally a technique known as randomisation supports the random variation of various parameters associated with a learning method. Neural networks for example require a random seed to set initial node weightings. This can be varied as each model is built, and almost all learning methods come with some form of parameterisation, and as such it has wide applicability.
Random forest is perhaps the best known example of these techniques and is widely supported, although the original work was initially implemented by Salford Systems. A random forest is built using a set of randomised decision trees in conjunction with the bagging algorithm (although variations do exist).
Tools which support ensemble methods include:
Open Source / Free
Orange
R
RapidMiner (Random Forest)
WEKA
Commercial
BigML (Decision tree ensembles)
Salford Systems
Statistica
wise.io (very fast random forest)