What is meant by metadata in the context of a Data warehouse? Explain different types of metadata stored in a data warehouse.
Metadata are data about data.When used in a data warehouse, metadata are the data that define warehouse objects.
Metadata are created for the data names and definitions of given warehouse.
Additional metadata are created and captured for timestamping any extracted data, the source of the extracted data and missing fields that have been added by data cleaning or integration process.
Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.
Metadata is the road map to a data warehouse.
Metadata can be broadly categorized into three types:
Business Metadata: It has the data ownership information, business definition and changing policies.
Technical Metadata: It includes database system names, table and column names and sizes, data types and allowed values. It also includes structural information such as primary and foreign key attributes and indices.
Operational Metadata: It includes currency of data and data lineage. Currency of data means whether the data is active, archived or purged. Lineage of data means the history of data migrated and transformation applied on it.
Metadata play a very important role than other data warehouse data and are important for many reasons. For example, metadata are used as a directory to help the decision support system analyst to locate the contents of the data warehouse and as a guide to the data mapping when data are transformed from the operational environment to the data warehouse environment.
Metadata also serve as a guide to the algorithms used for summarization between the current detailed data and the lightly summarized data, and between the lighly summarized data and higly summarized data.
Metadata should be stored and managed persistently (i.e., on disk).