CRISP-DM (Cross-Industry Standard Process for Data Mining)

on Friday, 19 July 2013
CRISP-DM (Cross-Industry Standard Process for Data Mining) is a consortium of companies which was established by the European Commission in 1996 and has been established as a standard process in data mining that can be applied in various industrial sectors. The image below describes the development life cycle of data mining has been defined in CRISP-DM.



The following are the six stages of the development life cycle of data mining (Chapman, 2000) :

1. Business Understanding

The first stage is to understand the goals and needs of the business point of view, then change this knowledge into the definition of the problem in data mining. Next will be determined plans and strategies to achieve those goals.

2. Understanding Data

This phase begins with the collection of data which will then be followed by a process to gain a deep understanding of the data, identify data quality problems, or to detect the interesting parts of the data that can be used for hypotheses for hidden information.

3. Data Preparation

This phase covers all activities to build the final data set (data that will be processed at the stage of modelling) from the raw data. This stage can be repeated multiple times. At this stage also includes the selection of tables, records, and data attributes, include cleansing and data transformation process to then be used as input in the modelling phase (modelling).

4. Modelling

In this stage will be the selection and application of modelling techniques and some parameters will be adjusted to get the optimum value. In particular, there are several different techniques that can be applied to the same data mining problem. On the other hand there are modelling techniques that need at a special data format. So at this stage it is still possible return to the previous stage.

5. Evaluation

At this stage, the model has been formed and is expected to have a good quality when viewed from the perspective of data analysis. At this stage of evaluation of the effectiveness and quality of the model before it is used and to determine whether the model can achieve the goals set in the early phase (Business Understanding). The key to this step is to determine whether there is a business problem that has not been considered. At the end of this stage should be determined using the results of the data mining process.

6. Deployment

At this stage, knowledge or information that has been obtained will be organized and presented in a special form that can be used by users. Deployment phase can be either simple or implement reporting data mining process is repeated in the company. In many cases, deployment phase involves consumers, in addition to the data analyst, because it is very important for consumers to understand the action what to do to use a model that has been created.

0 comments:

Post a Comment