Describe the various functionalities of Data Mining as a step in the process of Knowledge Discovery.

 


  • Knowledge discovery in the database is the process of searching for hidden knowledge in the massive amounts of data that we are technically capable of generating and storing.
  • The basic task of KDD is to extract knowledge (or information) from a lower level data (databases).
  • It is the non-trivial (significant) process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
  • The goal is to distinguish between unprocessed data something that may not be obvious but is valuable or enlightening in its discovery.
  • The overall process of finding and interpreting patterns from data involves the repeated application of the following steps:
    1. Data Cleaning
    2. Data Integration
    3. Data Selection
    4. Data Transformation
    5. Data Mining
    6. Pattern Evaluation
    7. Knowledge Presentation
  • Steps 1 to 4 are different forms of data preprocessing, where data are prepared for mining.
Fig. Data Mining as a step in the process of knowledge discovery
  1. Data Cleaning
    • Removal of noise, inconsistent data, and outliers
    • Strategies to handle missing data fields.
  2. Data Integration
    • Data from various sources such as databases, data warehouse, and transactional data are integrated.
    • where multiple data sources may be combined into a single data format.
  3. Data Selection
    • Data relevant to the analysis task is retrieved from the database.
    • Collecting only necessary information to the model.
    • Finding useful features to represent data depending on the goal of the task.
  4. Data Transformation
    • Data are transformed and consolidated into forms appropriate for mining by performing summary or aggregation operations.
    • By using transformation methods invariant representations for the data is found.
  5. Data Mining
    • An essential process where intelligent methods are applied to extract data patterns.
    • Deciding which model and parameter may be appropriate.
  6. Pattern Evaluation
    • To identify the truly interesting patterns representing knowledge based on interesting measures.
  7. Knowledge Presentation
    • Visualization and knowledge representation techniques are used to present mined knowledge to users.
    • Visualizations can be in form of graphs, charts or table.