Picture credits: Pawel Czerwinski

Normalization : A systematic approach to designing relational database schemas

Aug 22, 2023

Normalisation is the process divides larger tables into smaller, more manageable pieces and links them using relationships. The "normal forms" are a series of steps or criteria to achieve this goal. Normal forms (1NF, 2NF, 3NF, etc.) are rules that tables follow to qualify as relations and avoid redundancy.


First Normal Form (1NF)

  • Rule: A table is in 1NF if it only contains atomic (indivisible) values; there are no repeating groups or arrays.
  • Benefit: Eliminates duplicate columns and groups data by logical units.
  • Example: Consider a table where a student is associated with multiple courses in a single row, e.g., Course1, Course2, Course3, etc. This table is not in 1NF. To bring it into 1NF, you'd have each course as a separate row.

Original Table (Not in 1NF)

---------------------------
| Student | Course1 | Course2 | Course3 |
---------------------------
| Alice   | Math    | English | Science |
| Bob     | History | Math    | -       |
---------------------------

In the table above, each student is associated with multiple courses in a single row. This structure makes it difficult to add more courses for a student without altering the table schema (adding more columns). Moreover, a student with less than 3 courses results in NULL values (or blank spaces, as shown with Bob).

Converted Table (In 1NF)

----------------------
| Student | Course   |
----------------------
| Alice   | Math     |
| Alice   | English  |
| Alice   | Science  |
| Bob     | History  |
| Bob     | Math     |
----------------------

By looking at the 1NF table, you can easily add more courses for each student, remove courses, or query for specific course enrollments. It's a more flexible and normalized design.


After achieving 1NF, the next step is 2NF (Second Normal Form).

2NF (Second Normal Form) Definition:

A table is in 2NF if:

  1. It is in 1NF (has no repeating groups or arrays and has a primary key).
  2. All non-key attributes (columns) are fully functionally dependent on the entire primary key.

In simpler terms, 2NF mainly addresses issues in tables with composite primary keys (tables where the primary key is made up of more than one column). In such tables, all columns should depend on the entire composite key for their existence and not just a part of it.


Example:

Let's consider a university database where students are enrolled in multiple courses, and these courses are taught by various instructors. The primary key for the table might be a composite of both StudentID and CourseID because the combination of both uniquely identifies each row.

----------------------------------------
| StudentID | CourseID | CourseName | Instructor |
----------------------------------------
| 1001      | CSE101   | Math       | Mr. A      |
| 1001      | CSE102   | Physics    | Mr. B      |
| 1002      | CSE101   | Math       | Mr. A      |
----------------------------------------

In the table above, the combination of StudentID and CourseID is the composite primary key. Here's the problem: The Instructor column is dependent only on the CourseID (because a course has a specific instructor), not on the combination of StudentID and CourseID. Thus, we have partial dependency, which violates 2NF.

Conversion to 2NF:

To bring the table to 2NF, we'd split the table into two:

StudentCourses Table:

---------------------------
| StudentID | CourseID   |
---------------------------
| 1001      | CSE101     |
| 1001      | CSE102     |
| 1002      | CSE101     |
---------------------------

Courses Table:

-----------------------------
| CourseID | CourseName | Instructor |
-----------------------------
| CSE101   | Math       | Mr. A      |
| CSE102   | Physics    | Mr. B      |
-----------------------------

Now, the StudentCourses table relates students to courses without holding information about the course itself. The Courses table, on the other hand, contains details about each course, including who instructs it. This new structure adheres to 2NF because there's no partial dependency in any table.