In an earlier article I briefly described how data mining can help marketers be more efficient (read… increased marketing ROI!). These marketing analytics tools can significantly help with all direct marketing efforts (multichannel campaign management efforts using direct mail, email and call center) and some interactive marketing efforts as well. So, why aren’t all companies using it today? Well, typically it comes down to a lack of data and/or statistical expertise. Even if you don’t have data mining expertise, YOU can benefit from data mining by using a consultant. With that in mind, let’s tackle the first problem — collecting and developing the data that is useful for data mining.
The most important data to collect for data mining include:
Transaction data – For every sale, you at least need to know the product and the amount and date of the purchase.
Past campaign response data – For every campaign you’ve run, you need to identify who responded and who didn’t. You may need to use direct and indirect response attribution.
Geo-demographic data – This is optional, but you may want to append your customer file/database with consumer overlay data from companies like Acxiom.
Lifestyle data – This is also an optional append of indicators of socio-economic lifestyle that are developed by companies like Claritas. All of the above data may or may not exist in the same data source. Some companies have a single holistic view of the customer in a database and some don’t. If you don’t, you’ll have to make sure all data sources that contain customer data have the same customer ID/key. That way, all of the needed data can be brought together for data mining.
How much data do you need for data mining? You’ll hear many different answers, but I like to have at least 15,000 customer records to have confidence in my results.
Once you have the data, you need to massage it to get it ready to be “baked” by your data mining application. Some data mining applications will automatically do this for you. It’s like a bread machine where you put in all the ingredients — they automatically get mixed, the bread rises, bakes, and is ready for consumption! Some notable companies that do this include KXEN, SAS, and SPSS. Even if you take the automated approach, it’s helpful to understand what kinds of things are done to the data prior to model building.
Missing data analysis. What fields have missing values? Should you fill in the missing values? If so, what values do you use? Should the field be used at all?
Outlier detection. Is “33 children in a household” extreme? Probably – and consequently this value should be adjusted to perhaps the average or maximum number of children in your customer’s households.
Transformations and standardizations. When various fields have vastly different ranges (e.g., number of children per household and income), it’s often helpful to standardize or normalize your data to get better results. It’s also useful to transform data to get better predictive relationships. For instance, it’s common to transform monetary variables by using their natural logs.
Binning Data. Binning continuous variables is an approach that can help with noisy data. It is also required by some data mining algorithms.