The normalization process aims to minimize data duplications, avoid errors during data modifications, and simplify data queries from the database. The three fundamental normalization forms are known as:

In this reading, you will learn how to apply the rules that ensure that a database meets the criteria of these three normal forms.

The following example includes fictitious data required by a Medical Group Surgery based in London to generate relevant reports. Doctors work in multiple regions and various councils in London. And once patients book an appointment, they are given a slot ID at their local surgery. There might be multiple surgeries in the same council but with different postcodes, where one or more councils belong to a particular region. For example, East or West London.

Doctor ID Doctor name Region Patient ID Patient name Surgery Number Surgery council Postcode Slot ID Total Cost
D1 Karl West London P1
P2
P3 Rami
Kim
Nora 3 Harrow HA9SDE A1
A2
A3 1500 1200 1600
D1 Karl East London P4
P5 Kamel
Sami 4 Hackney E1 6AW A1
A2 2500 1000
D2 Mark East London P5
P6 Sami
Norma 4 Hackney E1 6AW A3
A4 1500 2000
D2 Mark West London P7
P1 Rose
Rami 5 Harrow HA862E A4
A5 1000
1500

The data listed in the table are in an unnormalized form. Repeating groups of data appear in many cases, for instance, doctors, regions, and council names. There are also multiple instances of data stored in the same cell, such as with the patient name and total cost columns. This makes it difficult to update and query data.  Moreover, it is not easy to choose a unique key and assign it as a primary key.

This unnormalized table can be written in SQL form as follows:

CREATE TABLE Surgery 
    (DoctorID VARCHAR(10), 
    DoctorName VARCHAR(50), 
    Region VARCHAR(20), 
    PatientID VARCHAR(10), 
    PatientName VARCHAR(50), 
    SurgeryNumber INT, Council VARCHAR(20), 
    Postcode VARCHAR(10), 
    SlotID VARCHAR(5), 
    TotalCost Decimal);

First normal form

To simplify the data structure of the surgery table, let’s apply the first normal form rules to enforce the data atomicity rule and eliminate unnecessary repeating data groups. The data atomicity rule means you can only have one single instance value of the column attribute in any table cell.

The atomicity problem only exists in the columns of data related to the patients. Therefore, it is important to create a new table for patient data to fix this. In other words, you can organize all data related to the patient entity in one separate table, where each column cell contains only one single instance of data, as depicted in the following example:

Patient ID Patient name Slot ID Total Cost
P1 Rami A1 1500
P2 Kim A2 1200
P3 Nora A3 1600
P4 Kamel A1 2500
P5 Sami A2 1000
P6 Norma A5 2000
P7 Rose A6 1000

This table includes one single instance of data in each cell, which makes it much simpler to read and understand. However, the patient table requires two columns, the patient ID and the Slot ID, to identify each record uniquely. This means that you need a composite primary key in this table. To create this table in SQL you can write the following code:

CREATE TABLE Patient
 (PatientID VARCHAR(10) NOT NULL, 
  PatientName VARCHAR(50), 
  SlotID VARCHAR(10) NOT NULL, 
  TotalCost Decimal,  
  CONSTRAINT PK_Patient 
  PRIMARY KEY (PatientID, SlotID));  

Once you have removed the patient attributes from the main table, you just have the doctor ID, name, region, surgery number, council and postcode columns left in the table.

Doctor ID Doctor name Region Surgery Number Surgery council Postcode
D1 Karl West London 3 Harrow HA9SDE
D1 Karl East London 4 Hackney E1 6AW
D2 Mark West London 4 Hackney E1 6AW
D2 Mark East London 5 Harrow HA862E

You may have noticed that the table also contains repeating groups of data in each column. You can fix this by separating the table into two tables of data: the doctor table and the surgery table, where each table deals with one specific entity.

Doctor table