Database Normalization and Denormalization: A Complete Guide
Understanding Normal Forms and When to Denormalize
Overview
Today, we’ll explore the concepts and applications of database normalization and denormalization. In database design, normalization is the process of maintaining data consistency and minimizing redundancy, while denormalization involves storing some data redundantly to optimize performance. By appropriately utilizing these two concepts, we can create efficient and scalable database designs.
What is Normalization?
Normalization is the process of organizing data to reduce redundancy, maintain integrity, and ensure data consistency. This is achieved by converting data into specific formats called Normal Forms (NF).
Goals of Normalization
Minimizing Data Redundancy
Maintaining Data Integrity
Ensuring Data Consistency
Preventing Anomalies:
Insertion Anomaly
Deletion Anomaly
Update Anomaly
Advantages of Normalization
Saves storage space
Makes data modification and management easier
Ensures data integrity
Disadvantages of Normalization
May lead to performance degradation due to increased JOIN operations
Increases query complexity
Normal Forms (NF)
Normalization progresses through stages: 1NF → 2NF → 3NF → BCNF → 4NF → 5NF. Generally, normalization is performed up to 3NF (Third Normal Form) or BCNF.
1️⃣ First Normal Form (1NF)
Requirements
All columns must contain atomic values
No multiple values in a single cell
Example
Before 1NF:
OrderID
CustomerName
Products
101
John Doe
TV, Refrigerator
102
Jane Smith
Washing Machine
After 1NF:
OrderID
CustomerName
Product
101
John Doe
TV
101
John Doe
Refrigerator
102
Jane Smith
Washing Machine
2️⃣ Second Normal Form (2NF)
Requirements
Must satisfy 1NF
Remove partial functional dependencies
All non-key attributes must depend on the entire primary key
Example
Before 2NF:
OrderID
CustomerName
CustomerAddress
Product
101
John Doe
Seoul
TV
101
John Doe
Seoul
Refrigerator
102
Jane Smith
Busan
Washing Machine
After 2NF:
Customer Table:
CustomerName
CustomerAddress
John Doe
Seoul
Jane Smith
Busan
Order Table:
OrderID
CustomerName
Product
101
John Doe
TV
101
John Doe
Refrigerator
102
Jane Smith
Washing Machine
3️⃣ Third Normal Form (3NF)
Requirements
Must satisfy 2NF
Remove transitive dependencies
No non-key attribute should depend on another non-key attribute
Example
Before 3NF:
OrderID
CustomerName
CustomerAddress
CustomerGrade
101
John Doe
Seoul
VIP
102
Jane Smith
Busan
Regular
After 3NF:
Customer Table:
CustomerName
CustomerAddress
CustomerGrade
John Doe
Seoul
VIP
Jane Smith
Busan
Regular
Order Table:
OrderID
CustomerName
101
John Doe
102
Jane Smith
BCNF (Boyce-Codd Normal Form)
Requirements
Must satisfy 3NF
Every determinant must be a candidate key
Example
Before BCNF:
DeptID
Professor
Course
CS01
Prof. Kim
Database
CS01
Prof. Kim
OS
CS02
Prof. Park
Data Structures
After BCNF:
Professor-Department Table:
Professor
DeptID
Prof. Kim
CS01
Prof. Park
CS02
Course Table:
DeptID
Course
CS01
Database
CS01
OS
CS02
Data Structures
4NF (Fourth Normal Form)
Requirements
Must satisfy BCNF
Remove multivalued dependencies
Split tables with multiple independent multivalued dependencies
Example
Before 4NF:
StudentID
Course
Club
S001
Database
Soccer
S001
OS
Soccer
S001
Database
Reading
S001
OS
Reading
After 4NF:
Student-Course Table:
StudentID
Course
S001
Database
S001
OS
Student-Club Table:
StudentID
Club
S001
Soccer
S001
Reading
5NF (Fifth Normal Form)
Requirements
Must satisfy 4NF
Remove join dependencies
Split tables to maintain data integrity
Example
Before 5NF:
StudentID
InstructorID
Course
S001
I100
Database
S002
I100
Database
S001
I200
OS
S002
I300
Data Structures
After 5NF:
Student-Instructor Table:
StudentID
InstructorID
S001
I100
S002
I100
S001
I200
S002
I300
Instructor-Course Table:
InstructorID
Course
I100
Database
I200
OS
I300
Data Structures
What is Denormalization?
Denormalization is the process of intentionally introducing redundancy into a database to improve performance. It involves storing some data redundantly to enhance query speed.
When to Apply Denormalization
When JOIN operations cause performance degradation
When read operations are frequent and write operations are rare
When real-time response speed is critical
Example
Normalized Structure (Requires JOIN):
-- Customer TableCREATETABLEcustomers(customer_idINTPRIMARYKEY,customer_nameVARCHAR(100));-- Order TableCREATETABLEorders(order_idINTPRIMARYKEY,customer_idINT,productVARCHAR(100),FOREIGNKEY(customer_id)REFERENCEScustomers(customer_id));
Comments