- Knowledge discovery in the database is the process of searching for hidden knowledge in the massive amounts of data that we are technically capable of generating and storing.
- The basic task of KDD is to extract knowledge (or information) from a lower level data (databases).
- It is the non-trivial (significant) process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
- The goal is to distinguish between unprocessed data something that may not be obvious but is valuable or enlightening in its discovery.
- The overall process of finding and interpreting patterns from data involves the repeated application of the following steps:
- Data Cleaning
- Data Integration
- Data Selection
- Data Transformation
- Data Mining
- Pattern Evaluation
- Knowledge Presentation
- Steps 1 to 4 are different forms of data preprocessing, where data are prepared for mining.
Fig. Data Mining as a step in the process of knowledge discovery
- Data Cleaning
- Removal of noise, inconsistent data, and outliers
- Strategies to handle missing data fields.
- Data Integration
- Data from various sources such as databases, data warehouse, and transactional data are integrated.
- where multiple data sources may be combined into a single data format.
- Data Selection
- Data relevant to the analysis task is retrieved from the database.
- Collecting only necessary information to the model.
- Finding useful features to represent data depending on the goal of the task.
- Data Transformation
- Data are transformed and consolidated into forms appropriate for mining by performing summary or aggregation operations.
- By using transformation methods invariant representations for the data is found.
- Data Mining
- An essential process where intelligent methods are applied to extract data patterns.
- Deciding which model and parameter may be appropriate.
- Pattern Evaluation
- To identify the truly interesting patterns representing knowledge based on interesting measures.
- Knowledge Presentation
- Visualization and knowledge representation techniques are used to present mined knowledge to users.
- Visualizations can be in form of graphs, charts or table.