It sounds fairly impressive, but is actually quite simple. The quadratic loss function gives a measure of how accurate a predictive model is. It works by taking the difference between the predicted probability and the actual value – so it is used on classification schemes which produce probabilities (Naive Bayes for example).

The word ‘quadratic’ means that the highest term in the function is a square. This is used to make sure all the differences are positive. The term ‘loss’ is self descriptive – it is a measure of the loss of accuracy. And finally it is a function.

When in use it gives preference to predictors that are able to make the best guess at the true probabilities. It is often used as the criterion of success in probabilistic prediction situations.

#### The Math

For a single instance in the dataset assume there are k possible outcomes (classes). the probability vector p1, p2, …,pk represents the probabilities that the instance is classified by the k classes.

The actual outcome is represented by a vector a1, a2,…, ak – where one of the actual components (the ith) is 1 – the class the instance actually belongs to. All the other values in the vector are zero.

The quadratic loss function is most simply expressed by:

In practice it gets more involved than this, but we won’t go there.