Computer Science: Data Preprocessing (part 1)

Overview

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.

Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural net works).

Why we need Data Preprocessing ?

Data in the real world is dirty

incomplete : the value of attribute doesn't complete, attribute that must exist but it just not exist, or just aggregate data is available
noisy : contain error or outliers
inconsistent : there is discrepancies in coding and value
redundant data

2. No quality data, no quality mining results (garbage in, garbage out)

quality decisions must be based on quality data
data warehouse needs a combination of data which is have a certain quality

3. Data extraction, cleaning, and transformation is an important part for data warehouse

Data Preprocessing Task

Data Cleaning

fill in missing values, smooth noisy data, identify or remove outliers, and resolver inconsistencies

Data Integration

integration of multiple databases or files

Data Transformation

normalization and aggregation

Data Reduction

obtains reduced representation in volume but produces the same or similar analytical results

Data Discretization

part of data reduction but with particular importance, especially for numerical data

Computer Science

Nav

Data Preprocessing (part 1)

Overview

Data Preprocessing Task

0 comments:

Post a Comment

Popular Posts

Blog Archive

Tags