Concept drift

From Wikipedia, the free encyclopedia


In predictive analytics, the term concept refers to the quantity you are looking to predict. For example, in a fraud detection application the target concept may be a binary attribute FRAUDULENT with values "yes" or "no" that indicates whether a given transaction is fraudulent. Or, in a weather prediction application, there may be several target concepts such as TEMPERATURE, PRESSURE, and HUMIDITY. More generally, the term concept can also refer to other phenomena of interest other than the target concept, such as an input, but in the context of concept drift we are specifically talking about the target concept.

The term concept drift, then, refers to unforeseen changes over time in the phenomenon of interest. For example, online shopping behaviors change over time. Let's say you want to predict weekly merchandise sales, and you have developed a predictive model that works to your satisfaction. The model may use inputs such as the amount of money spent on advertising, promotions you are running, and other metrics that may affect sales. What you are likely to experience is that the model will become less and less accurate over time - you will be a victim of concept drift. In the merchandise sales application, one reason for concept drift may be seasonality, which means that shopping behavior changes seasonally. You will likely have higher sales in the winter holiday season than during the summer.

To prevent deterioration in prediction accuracy over time you will have to refresh your model periodically. One approach is to retrain your model with new data, and the other approach is to also add new inputs before retraining your model. For our sales prediction application you may be able to reduce concept drift by adding information about the season to your model. By providing information about the time of the year you will likely reduce rate of deterioration of your model, but you likely will never be able to prevent concept drift altogether. This is because actual shopping behavior does not follow any static, finite model. New factors may arise at any time that influence shopping behavior, the influence of known factors may change, and interactions between known factors may change as well.

The bottom line is that concept drift cannot be avoided if you are looking to predict a complex phenomenon that is not governed by fixed laws of nature. All processes that arise from human activity, such as socioeconomic processes, and biological processes are likely to experience concept drift. Therefore, periodic retraining, also known as refreshing of your model is inescapable.

Contents

[edit] Bibliographic references

[edit] See also

[edit] Software

[edit] Researchers working on concept drift problems