The normalization process aims to minimize data duplications, avoid errors during data modifications, and simplify data queries from the database. The three fundamental normalization forms are known as:
In this reading, you will learn how to apply the rules that ensure that a database meets the criteria of these three normal forms.
The following example includes fictitious data required by a Medical Group Surgery based in London to generate relevant reports. Doctors work in multiple regions and various councils in London. And once patients book an appointment, they are given a slot ID at their local surgery. There might be multiple surgeries in the same council but with different postcodes, where one or more councils belong to a particular region. For example, East or West London.
Doctor ID | Doctor name | Region | Patient ID | Patient name | Surgery Number | Surgery council | Postcode | Slot ID | Total Cost |
---|---|---|---|---|---|---|---|---|---|
D1 | Karl | West London | P1 | ||||||
P2 | |||||||||
P3 | Rami | ||||||||
Kim | |||||||||
Nora | 3 | Harrow | HA9SDE | A1 | |||||
A2 | |||||||||
A3 | 1500 1200 1600 | ||||||||
D1 | Karl | East London | P4 | ||||||
P5 | Kamel | ||||||||
Sami | 4 | Hackney | E1 6AW | A1 | |||||
A2 | 2500 1000 | ||||||||
D2 | Mark | East London | P5 | ||||||
P6 | Sami | ||||||||
Norma | 4 | Hackney | E1 6AW | A3 | |||||
A4 | 1500 2000 | ||||||||
D2 | Mark | West London | P7 | ||||||
P1 | Rose | ||||||||
Rami | 5 | Harrow | HA862E | A4 | |||||
A5 | 1000 | ||||||||
1500 |
The data listed in the table are in an unnormalized form. Repeating groups of data appear in many cases, for instance, doctors, regions, and council names. There are also multiple instances of data stored in the same cell, such as with the patient name and total cost columns. This makes it difficult to update and query data. Moreover, it is not easy to choose a unique key and assign it as a primary key.
This unnormalized table can be written in SQL form as follows:
CREATE TABLE Surgery
(DoctorID VARCHAR(10),
DoctorName VARCHAR(50),
Region VARCHAR(20),
PatientID VARCHAR(10),
PatientName VARCHAR(50),
SurgeryNumber INT, Council VARCHAR(20),
Postcode VARCHAR(10),
SlotID VARCHAR(5),
TotalCost Decimal);
To simplify the data structure of the surgery table, let’s apply the first normal form rules to enforce the data atomicity rule and eliminate unnecessary repeating data groups. The data atomicity rule means you can only have one single instance value of the column attribute in any table cell.
The atomicity problem only exists in the columns of data related to the patients. Therefore, it is important to create a new table for patient data to fix this. In other words, you can organize all data related to the patient entity in one separate table, where each column cell contains only one single instance of data, as depicted in the following example:
Patient ID | Patient name | Slot ID | Total Cost |
---|---|---|---|
P1 | Rami | A1 | 1500 |
P2 | Kim | A2 | 1200 |
P3 | Nora | A3 | 1600 |
P4 | Kamel | A1 | 2500 |
P5 | Sami | A2 | 1000 |
P6 | Norma | A5 | 2000 |
P7 | Rose | A6 | 1000 |
This table includes one single instance of data in each cell, which makes it much simpler to read and understand. However, the patient table requires two columns, the patient ID and the Slot ID, to identify each record uniquely. This means that you need a composite primary key in this table. To create this table in SQL you can write the following code:
CREATE TABLE Patient
(PatientID VARCHAR(10) NOT NULL,
PatientName VARCHAR(50),
SlotID VARCHAR(10) NOT NULL,
TotalCost Decimal,
CONSTRAINT PK_Patient
PRIMARY KEY (PatientID, SlotID));
Once you have removed the patient attributes from the main table, you just have the doctor ID, name, region, surgery number, council and postcode columns left in the table.
Doctor ID | Doctor name | Region | Surgery Number | Surgery council | Postcode |
---|---|---|---|---|---|
D1 | Karl | West London | 3 | Harrow | HA9SDE |
D1 | Karl | East London | 4 | Hackney | E1 6AW |
D2 | Mark | West London | 4 | Hackney | E1 6AW |
D2 | Mark | East London | 5 | Harrow | HA862E |
You may have noticed that the table also contains repeating groups of data in each column. You can fix this by separating the table into two tables of data: the doctor table and the surgery table, where each table deals with one specific entity.