In the earlier articles, we have learned about various concepts related to database management systems. In this article, you will learn about what is normalization, how it is performed etc.
In case, if you are new to DBMS concepts, I would suggest you to go through the following article, to understand about database systems, before reading this article.
Normalization in DBMS studies is a technique which is used for the design of DBMS. It is the process of analyzing given database (normally relational database) to remove redundancy and increase efficiency by making changes to the tables.
This technique is first proposed by CODD in 1972.
This process includes a series of tests, which are used for designing the data base management system. These tests or constraints help in creating tables, that are independent are more useful in database.
These constraints are checked and data is modified as per the constraints in a sequence. Codd proposed 3 phases initially. These phases are called first normal form (1 NF), second normal form (2 NF) and third normal form (3 NF) respectively. Other normal forms are developed later.
Following are the advantages of normalization.
- Normalization reduces redundancy.
- Insertion, deletion and updation becomes easy to handle.
This article describes about 1 NF, 2 NF and 3 NF.
First Normal Form (1 NF) :
In the given table of database, all attributes must be atomic. In other words, the records indicating the attributes of a feature must be single valued only.
If a given table obeys the above rule it is said to be in first normal form or 1 NF.
For example, in database of students, student Id is normally an atomic attribute but phone number is a multi valued attribute. Because, a student will have only one Id, where as same student may have multiple phone numbers.
The given un-normalized table can be split up into several tables for achieving first normal form.
Second Normal Form:
Following are various criteria that are to be applied for getting database into second normal form.
A data table should be in the first normal form.
Further, there should not be any partial dependency of any attribute columns on the primary key.
For example, in a database of a coaching center, details such as student Id, name, phone number, course, batch id, batch time etc., may be available. Of these details in the database, if student id is considered as primary key, name and phone number may depend on that student id. However batch timing is not directly related to student, instead it is related to batch number. This means some data is primarily dependent on one attribute, while other data is primarily dependent on other attribute. This is called partial dependency. In such situations, table is split based on the key attributes and both key attributes are linked using a separate table.
Third Normal Form:
Following are the criteria for third normal form.
- The table must be in the second normal form before.
- Every non-prime attribute of table is non-transiently dependent on every key of table.
If several components of data are dependent on each other, rather than one single attribute, the data is said to be transiently dependent.
For example, student Id, name, department, and department head are attribute heads in a database table. If student id is the primary attribute, then name is dependent on student id. Department may also be dependent on student id, however, department head is dependent on department rather than student id. However, department is related to student id, hence department is transiently related to student id.
Such relations should not exist in the data base, if any such relations are present separate tables are made to bring the data into third normal form or 3 NF.
1. Tutorials Point : Link
2. Beginners Book : Link
Related Video Source:
- Normalization and First Normal Form
2. Second Normal Form (2 NF)
3. Third Normal Form (3 NF)