The Data Mining Handbook
The Data Mining Handbook: Part I Introduction to the Data Mine
Data mining is part art and lots of science. It is the extraction of actionable (predictive) information from large databases using advanced algorithms. Organizations have been harnessing the power of massive data collections and algorithms to separate relevant information from the noise created by big data. One of the key goals of any data-mining project is the ability to accurately predict future outcomes based on volumes of historical data. In other words, while analytics answers the question, "what a company's sales were over the last few months?" data mining aims to predict what sales will be in the future by mining large databases for predictive information.
Given the advancement in cloud based services allowing relatively cheap access to massive parallel processing, data mining techniques have evolved significantly over the last few years to include more advanced approaches like artificial neural networks to rule based techniques where "if-then" rules are created based on significance. Other techniques such as decision trees and the nearest neighbor method are being applied to data warehousing applications, which enhances accuracy, yields more actionable insights and is more efficient when run against divergent sources of big data.
Once of the first critical steps in creating a data mining project is building a "model" from the right data that incorporates all the important relationships between the data fields. Data mapping, diagrams and flowcharts are important tools that help construct an accurate model. A thoughtful and documented data map will help correct any errors and prepare any changes before the first line of code is written. High-level data modeling can be used to identify key relationships between fields. Once this has been accomplished, business rules need to be considered to ensure that the model meets specific business requirements. It is important that all stake-holders are included in the modeling processes to make certain all business requirements are taken into account. Now that the model includes a clear understanding of data relationships and business rules, the model must take into account traits like predictors. A predictor is generally a field that represents a critical value representing customer behavior. By ranking key customer predictors and including these into the model, effective data miners can move past assumption with a truer understanding of what motivates customer behavior.
Multiple predictors can be combined using formulas. For example, key predictive customer fields include: 1) Frequency 2) Monthly Sales and 3) Personal Income.A simple predictive formula example: 2 x Personal Income + High Frequency = Better Sales Conversions. In other words, if a customer has high personal income and visits the store frequently, then the customer will probably have a higher than average total sales. Historical data comsumed by the model can confirm if the model is correct. If correct, the model can be used to segment out the highest sales converting customers for marketing purposes.
While this is an admittedly simple model, it should be noted that models can become increasingly complex as predictor fields are expanded and weighted based on historical analysis.
Our next series will focus on more complex models and real world best practices.