In the heady world of statistical analysis nothing is ever as it seems. Logistic regression is not a form of regression – and the term is wholly misleading. But just as some cavalry units still have a fifth man to look after the horses – despite the fact there are no horses any more, so the term regression will continue to be used with no sensible meaning at all. In data mining the term regression defines a category of methods (linear regression being the best known) which are used to predict numerical values – and almost always continuous ones. Logistic regression is used to classify data, and in its most heavily used form these are binary categories – yes/no, pass/fail and so on. Hence its use in credit ratings.
The term logistic comes from the fact that the logistic function, well loved by people who study population growth, is used to convert calculations into a form of probability. Below is a diagram displaying a set of data which show some fictitious relationship between salary, age and propensity to be bad or good credit risks (G – good, and B – bad).
The long red line (often called a linear discriminant) separates out the good – G, from the bad – B credit risks. What is important here, from a logistic regression perspective is the two shorter red lines pointing to two instances of a B credit risk. One is very near to the line and the other is further away. We would expect the further the distance from the line the more certain the credit risk will be bad (in this example), and as such we want to give it a higher probability. This is where the logistic function comes in. It very conveniently will transform values which range from minus infinity to plus infinity, to a range of zero to one, which happily coincides with measure of probability. As such points further away from the line are given a higher probability of being in their respective category.
Of course it all gets much more involved, but this is the gist of logistic regression – a very useful method for classification.
Logistic regression functionality can be found in the following free data mining and statistical analysis platforms: