The direct marketing magazines are loaded with articles on the benefits of better understanding your customers. Especially in difficult economic times, these suggestions get paraded out. Rarely is much detail offered, especially about how many or what kinds of customer variables are most beneficial. Usually geodemographic clusters are alluded to, sometimes specific demographics and most talk about Recency, Frequency and Monetary (RFM). My experience suggests that only another two or three hundred are important in getting a complete picture of your customers' behavior. Even then, they never seem to quite match what you're looking for. You might guess that soccer moms probably drive SUV's since lots of them do in your neighborhood... but is it worth getting vehicle type when it will only match on 5% of your customers. I never cease to be amazed at how difficult it is to get just what you're looking for. In fact, I have a theory that there are an infinite number of possible variables about each one of your customers. Now, I'm also pretty certain that most of them aren't worth building, like how far is customer X from the center of the sun. Or, how many witches per capita live in each ZIP code. Now its not impossible that those be relevant... on the other hand, we should attempt some kind of a theory to create a variable. But that isn't really the topic of discussion. The variable I'm interested in is not in the customer database... it is in the modeler.
After decades of successes in modeling, I have run across my share of defeats as well. Thankfully, most of those were handed to other modelers. There was a simple pattern in most of them. We beat them because we built more variables, did more models and validations... I guess we out worked them. Which brings me to perhaps the most important variable missing in most marketing databases - integrity.
Integrity basically means doing what you say. But in modeling, its so easy to stop with ‘good enough'. Its so easy to pack up at 5 pm and hope it works fine. Tonight, we will be working through the night on a complex triple model. We found out yesterday that our client had mis-specified the deadline - its due tomorrow, not a week from now. Believe me, it is very easy to tell yourself - this is good enough - but we follow our regular procedures - which means probably two dozen validations - a count review and finally three separate pulls. Most importantly, we will continue to look for interesting connections - ways to make our client's project more successful.