Monday, June 23, 2008

"Statistics are no substitute for judgment." Henry Clay

Why is it that statistics have such a hold on our brain? In the '80's businesses could lure investors simply by displaying their data on computer generated graphs. In the classic book, How to Lie with Statistics, by Darrell Huff, W.W. Norton & Co., New York, 1954 (reissued in 1982 and 1993), we gain some insight into the way our minds interpret data.* The book is really more about how we hear and see the portrayal of data.

Boiling it down, our eyes make judgments about the light that is hitting them. Have you ever been driving down the interstate, not paying particular attention to the cars passing by in the opposite direction. Suddenly, you spot a police car. You weren't looking for one and perhaps you weren't even speeding. But, through life, you have programmed your eyes to alert you when a vehicle arises with little lights on the top. You may have had the same experience only to find that it was a car with a ski or bike rack on the top. So, we have perception filters which, while generally helping us also often unconsciously shape our reality.

Statistics tend to be a short-hand for important information. They also tend to be numeric, so that they are easily portrayed with graphs and charts. So, when trying to magnify the difference between numbers, rather than using a line or even a bar, use a three dimensional object, because we 'see' images as objects. After all, we never really 'see' anything but images, though we infer a 3D world around us.

Whenever attempting to interpret summary information (and it is not generally possible to understand any amount of raw data) remember to ask yourself these three important questions.

What is the point? The graph is included to make a point. Make sure you understand the point before you look at the illustration of that point. Ask yourself if there are considerable reasons to think the point might be correct, before you look to the supporting data.

Who wins? Always ask, who is going to substantially benefit, should this point be correct.

Who wants what? Motives (especially unexamined) always shapes interpretation of data, never assume that scientists or statisticians are without motives, even corrupt ones.


* For a quick summary of the types of information in the book

Thursday, June 12, 2008

There are three kinds of lies: lies, damned lies, and statistics.

I still remember my first seminar on modeling. I was shown how, with proper statistical techniques, done by a Ph.D. Statistician, one could find the top 20% of the customers who produced 80% of the profits in a mailing. Neiman Marcus ran the tests and the graphics were impressive. However, questions arose in my mind, I raised my hand. "You show very dramatic improvements over your control mailing, what customer segmentation method did you use for comparison?"

"A random sample." was the brusk reply. The answer, coming from a Dr. of Statistics probably went right by most of the audience. I knew, on the other hand, that just selecting the most recent 0-3 month buyers would probably generate similar if not better results. Adding segments of 3-6 and 6-12 would have ruined the beautiful presentation.

And those many years ago, I learned a very valuable lesson. It isn't just about testing, it is about test design and integrity. Direct Marketing does offer the possibility of learning. But it also offers the opportunity for manipulation and statistical deception. Next time you are listening to a public presentation about the magic of statistics or database segmentation or offer personalization, remember, if the numbers are detailed the client probably isn't present, if they are not detailed the testing was probably not valid.

We have been told for decades about how we can make money with data - the truth is, it is not such a simple truth.


An excellent article on the problems with digital data http://www.lewrockwell.com/giles/giles22.html