Cross Industry Standard Process for Data Mining

From Wikipedia, the free encyclopedia

CRISP-DM stands for CRoss Industry Standard Process for Data Mining[1]. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems.

Contents

[edit] Major phases

CRISP-DM breaks the process of data mining into six major phases[2]:

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment

[edit] History

CRISP-DM began as a European Union project under the ESPRIT funding initiative. The project was led by four companies: ISL, NCR, Daimler-Benz and OHRA.

This core consortium brought different experiences to the project: ISL, later acquired and merged into SPSS Inc. . NCR computer giant produced the Teradata datawarehouse and its own data mining software. Daimler-Benz (now DaimlerChrysler) had a significant data mining team. OHRA, an insurance company, was just starting to explore the potential use of data mining.

The first version of the methodology was released as CRISP-DM 1.0 in 1999.

[edit] CRISP-DM 2.0

In July 2006 the consortium announced that it was going to start the process of working towards a second version of CRISP-DM. On 26 September 2006, the CRISP-DM SIG met to discuss potential enhancements for CRISP-DM 2.0 and the subsequent roadmap.

[edit] Advantages

  • Industry neutral
  • Tool neutral
  • Closely related to KDD Process Model
  • Anchors the data mining process

[edit] References

  1. ^ Shearer C. The CRISP-DM model: the new blueprint for data mining. J Data Warehousing 2000;5:13—22.
  2. ^ Harper, Gavin; Stephen D. Pickett (August 2006). "Methods for mining HTS data". Drug Discovery Today 11 (15-16): 694–699. doi:10.1016/j.drudis.2006.06.006. 

[edit] External links


Languages