Data Warehouse

on Thursday 11 July 2013
The data warehouse is a collection of data from various sources stored in a data warehouse (repository) in a large capacity and used for decision-making process (Prabhu, 2007). According to William Inmon, the characteristics of the data warehouse are as follows:

  1. Subject oriented : On operating system, data is stored by the application. Data sets only consists of data required by the relevant functions and applications. While the data warehouse, data is not stored by the application, but by subject. For example, for a company manufacturing subjects important business, namely the sale, transport, and storage of goods.
  2. Integrated : Data stored in a data warehouse consists of various systems operational. Therefore there is a possibility that some are difference, ie the naming conventions, code representation, data attributes, and measurement data. The fourth distinction had to be made first must conform to certain standards so that the data that will be stored can be integrated in the data warehouse.
  3. Time variant : In the data warehouse, the data is historical data stored within certain time, not current data. Therefore the data stored contains a description of the time, such as date, week, month, quarterly, and so on. Characteristic time variant in the data warehouse has the following characteristics:
    • Analyzing the case in the past.
    • Looking for a relationship between the current state of information.
    • Make predictions of things to come.
  4. Non-volatile : The data in the operational systems can be updated according to business transactions. Each time business transaction. But in the data warehouse, the data can not be changed because it is read only. Data warehouse architecture includes the ETL (Extraction, Transformation, Loading) to move data from the operational data source and other external data sources into a data warehouse. data warehouse can be divided into multiple data marts, based on business function (eg data marts for sales, marketing, and finance). The data in the data warehouse and data mart managed by one or more servers that represent multidimensional view of data against a variety of front end tools, such as query tools, analysis tools, report writers, and data mining tools. ETL process (Extraction, Transformation, Loading) is a process that must be through the establishment of a data warehouse (Kimball, 2004). 


Here are explanation of each process :
  1. Data Extraction (Extract) : Data extraction is the process by which data is collected or extracted from various operating system, either using a query, or ETL applications. There are some data extraction functions, namely:
    • Automatic data extraction from the source application.
    • Screening or selection of data extraction.
    • Delivery of data from various sources into a data application platform.
    • Changes to the data layout format of the original format.Storage in a temporary file for incorporation with the results extraction from other sources.
  2. Data Transformation (Transformation) : Transformation is the process by which raw data (raw data) extracted filtered and modified in accordance with the applicable business rules. Measures in the transformation of the data is as follows:
    • Map the input data from the original data schema to schema of data warehouse.
    • Converting data types or data formats.
    • Cleanup and disposal of data duplication and errors.
    • Calculation values ​​or first derivatives.
    • Calculating the values ​​of aggregate or summary.
    • Examination integrity of reference data.
    • Filling the empty values ​​with default values.
    • Merging data.
  3. Charging Data (Loading) : The last process that needs to be done is the process of loading data obtained from the transformation into the data warehouse. way to load data is to run SQL scripts periodically.

0 comments:

Post a Comment