Overview
Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural net works).
Why we need Data Preprocessing ?
- Data in the real world is dirty
- incomplete : the value of attribute doesn't complete, attribute that must exist but it just not exist, or just aggregate data is available
- noisy : contain error or outliers
- inconsistent : there is discrepancies in coding and value
- redundant data
2. No quality data, no quality mining results (garbage in, garbage out)
- quality decisions must be based on quality data
- data warehouse needs a combination of data which is have a certain quality
3. Data extraction, cleaning, and transformation is an important part for data warehouse
Data Preprocessing Task
Data Cleaning
- fill in missing values, smooth noisy data, identify or remove outliers, and resolver inconsistencies
Data Integration
- integration of multiple databases or files
Data Transformation
- normalization and aggregation
Data Reduction
- obtains reduced representation in volume but produces the same or similar analytical results
Data Discretization
- part of data reduction but with particular importance, especially for numerical data
0 comments:
Post a Comment