Normalization in database

 


  • Normalization is a formal technique for analyzing relations based on their primary key ( or candidate key in the case of BCNF) and functional dependencies.
  • Normalization is often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form.
  • Three normal forms were initially proposed, which are called first (1NF), second (2NF) and third (3NF) normal form. Subsequently, a stronger definition of third normal form was introduced and is referred as Boyce-Codd normal form (BCNF). All these normal forms are based on the functional dependencies among the attribute of a relation.
Steps in Normalization:



First Normal Form
  1. The cells of the table must have single value
  2. Neither repeating group nor array are allowed as values.
  3. All entries in any column (attribute) must be of the same kind.
  4. Each column must have a unique name, but the order of the column in the table is insignificant.
  5. No two rows in a table may be identified and the order of rows is insignificant.
Second Normal Form
Deal with partial dependencies. A relation R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R.
  1. Anomalies can occur when attributes are dependent on only part of a multi-attributes key.
  2. A relation is in second normal form when all non-key attributes are dependent on the candidate key.
  3. That is no attributes is dependent on only a part of the key.
  4. Any relation having a key with a single attribute is in second normal form.
Third Normal Form
Deal with transitive dependencies.
  1. Anomalies can occur when relation contain one or more transitive dependencies.
  2. A transitive dependency A→C exists when A→B, B→C and not B→A.
  3. A relation is in 3NF when it is in 2NF and has no transitive dependencies.
According to codd's original definition, a relation schema R is in 3NF if and only if both of the following conditions hold.
  1. The relation R is in second normal form (2NF).
  2. Every non-prime attribute of R is non-transitively dependent on every candidate key of R.
Boyce-Codd Normal Form
Takes care of all candidate keys.
  1. Anomalies can occur in relations in 3NF if there are determinants in the relation that are not candidate keys.
  2. A relation is in BCNF if every determinant is a candidate key. The left-hand side of the determined is called determinant.
  3. To test whether a relation is in BCNF, we identify all the determinants and make sure that they are candidate keys.
Fourth Normal Form
Deals with Multivalued dependencies.
  1. Multivalued dependencies can occur in relations in BCNF.
  2. A multivalued dependency (MVD) occurs when a single attribute say A, multidetermined two other independent attributes, say B and C. That is A →→B and A →→C.
  3. X→→Y i.e.X multidetermined Y, when for each value of X we can have more than one value of Y.
Fourth normal form (4NF) is a stronger normal form than BCNF as it prevents relations from containing nontrivial MVDs and hence data redundancy. The normalization of BCNF relations to 4NF involves the normal of the MVD from the relation by placing the attribute(s) in a new relation along with a copy of the determinant(s).