A moment of light relief for all those struggling to make sense of big data, machine learning, statistics and predictive analytics.
Shown below are four records from a survey of 2000 people conducted by the marketing department. The data will be used to better target prospects:
The analysts start to clean the data and engineer the features that will produce the best predictive model. The lead analyst presents his findings:
“We have isolated four composite features that give very good results using a neural network.”
F1 = AGE x NAME – FAV COLOR
F2 = SEX x SALARY X CAR
F3 = MARRIED – AGE
F4 = FAV COLOR
He insisted that these were not as silly as they looked because all attributes had been transformed into numbers using the Tchaikovsky – Einstein algorithm .
The neural network only had 200 hidden neurons, and after running for 2 days on a cluster of 400 servers it had spewed out five models which gave a p-value of just 0.01 on test data. A fine statistical result by any measure.
Here is the lift this model gave for training (1), test (2) and live data (3) using the latest data visualization tools (please note the 3D histogram).
Obviously senior management was not pleased. A postmortem of the predictive model showed that you can never trust people who like the color red.
In the time honored tradition management blamed the consultants, the consultants blamed the data scientists and the data scientists blamed the management. In the end they all agreed to blame the technology supplier and they all lived happily ever after.