Data Mining Task

on Thursday 27 June 2013
Generally, task in Data Mining divided by 2 task :
  • Predictive
         uses multiple variables to predict the unknown value or future value of other variables
  • Descriptive
         determine the patterns that can be interpreted by who is person describe the data

And more details are :
  • Classification (predictive)
  • Grouping / Clustering (descriptive)
  • Association Rules (descriptive)
  • Sequential Pattern (descriptive)
  • Regressive (predictive)
  • Deviation Detection (predictive)


Classification  

Given a collection of records (training set) each record containing a set of attributes, one attribute is the class. Find a right model for the attribute class as function from another attribute.
Goal : previously unseen records should define a class as accurately as possible to.

Grouping / Clustering

Given set of data, each containing a set of attributes, and the size of each of them, specify group (clusters) such that :
- Data points within a group are similar to each other
- Data points in separate groups less similar to each other

Similarity Measurement :
- Euclid distance if the attribute is continuous 
- Another case, the measurement be adapted

Association Rules

Given set of records that contain multiple items from a set of known. Generate dependency rules which will predict occurrence of an item based on the occurrence other items.


Sequential Pattern

Given set of objects, with each object is associated with a line of events, specify sequentially atyran which predicts a strong dependence between different events.

Regressive

Predicting a continuous variable value known value based on the values ​​of other variables, the model assuming a linear or non-linear dependence. Studied at statistics and neural network.

Deviation Detection

This is important task in data mining. In identifying deviation, the deviation-based approach has several blessings and attracts much attention. Though a linear algorithm for sequential deviation detection is proposed, it isn't stable and perpetually loses several deviation points.

0 comments:

Post a Comment