Database Design Best Practices: Normalization and Performance

Introduction

[Explain database design impact on query performance, data integrity, and maintainability; balance normalization vs performance.]

Prerequisites

SQL Server or equivalent RDBMS
Understanding of relational concepts
Sample schema for examples

Normalization Levels

Normal Form	Rule	Purpose
1NF	Atomic values, no repeating groups	Eliminate duplicate columns
2NF	1NF + no partial dependencies	Remove redundant data
3NF	2NF + no transitive dependencies	Eliminate derived data
BCNF	3NF + every determinant is a key	Strict functional dependency

Step-by-Step Guide

Step 1: Identify Entities & Relationships

Example Domain: E-commerce

Entities: Customer, Order, Product, OrderItem
Relationships: Customer → Order (1:M), Order → OrderItem (1:M), Product → OrderItem (1:M)

Step 2: Design Schema (3NF)

CREATE TABLE Customer (
    CustomerID INT PRIMARY KEY IDENTITY,
    FirstName NVARCHAR(50) NOT NULL,
    LastName NVARCHAR(50) NOT NULL,
    Email NVARCHAR(100) UNIQUE NOT NULL,
    CreatedDate DATETIME2 DEFAULT GETDATE()
);

CREATE TABLE Product (
    ProductID INT PRIMARY KEY IDENTITY,
    ProductName NVARCHAR(100) NOT NULL,
    CategoryID INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    CONSTRAINT FK_Product_Category FOREIGN KEY (CategoryID) REFERENCES Category(CategoryID)
);

CREATE TABLE [Order] (
    OrderID INT PRIMARY KEY IDENTITY,
    CustomerID INT NOT NULL,
    OrderDate DATETIME2 DEFAULT GETDATE(),
    Status NVARCHAR(20) NOT NULL,
    CONSTRAINT FK_Order_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);

CREATE TABLE OrderItem (
    OrderItemID INT PRIMARY KEY IDENTITY,
    OrderID INT NOT NULL,
    ProductID INT NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    CONSTRAINT FK_OrderItem_Order FOREIGN KEY (OrderID) REFERENCES [Order](OrderID),
    CONSTRAINT FK_OrderItem_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID)
);

Step 3: Apply Constraints & Defaults

ALTER TABLE Product ADD CONSTRAINT CK_Product_UnitPrice CHECK (UnitPrice >= 0);
ALTER TABLE OrderItem ADD CONSTRAINT CK_OrderItem_Quantity CHECK (Quantity > 0);

Step 4: Strategic Denormalization

Scenario: Reporting query frequently needs total order value

Normalized (Expensive):

SELECT o.OrderID, SUM(oi.Quantity * oi.UnitPrice) AS TotalAmount
FROM [Order] o
JOIN OrderItem oi ON o.OrderID = oi.OrderID
GROUP BY o.OrderID;

Denormalized (Add computed column):

ALTER TABLE [Order] ADD TotalAmount AS (
    SELECT SUM(Quantity * UnitPrice)
    FROM OrderItem
    WHERE OrderItem.OrderID = [Order].OrderID
) PERSISTED;

Step 5: Indexing Strategy

Primary Keys (Clustered):

-- Automatically created with PRIMARY KEY

Foreign Keys (Non-Clustered):

CREATE NONCLUSTERED INDEX IX_OrderItem_OrderID ON OrderItem(OrderID);
CREATE NONCLUSTERED INDEX IX_OrderItem_ProductID ON OrderItem(ProductID);

Covering Index for Common Query:

CREATE NONCLUSTERED INDEX IX_Order_CustomerID_Status
ON [Order](CustomerID, Status)
INCLUDE (OrderDate, TotalAmount);

Step 6: Partitioning for Scale

Horizontal Partitioning (Sharding):

-- Partition by OrderDate range
CREATE PARTITION FUNCTION PF_OrderDate (DATETIME2)
AS RANGE RIGHT FOR VALUES ('2024-01-01', '2025-01-01', '2026-01-01');

CREATE PARTITION SCHEME PS_OrderDate
AS PARTITION PF_OrderDate ALL TO ([PRIMARY]);

CREATE TABLE [Order] (
    OrderID INT PRIMARY KEY,
    OrderDate DATETIME2,
    -- other columns
) ON PS_OrderDate(OrderDate);

Step 7: Audit & History Tracking

Temporal Tables:

ALTER TABLE Customer ADD
    SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN,
    SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN,
    PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);

ALTER TABLE Customer SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CustomerHistory));

Performance Optimization Patterns

Pattern 1: Avoid EAV (Entity-Attribute-Value)

Anti-Pattern:

CREATE TABLE ProductAttribute (
    ProductID INT,
    AttributeName NVARCHAR(50),
    AttributeValue NVARCHAR(200)
);

Better (JSON for semi-structured):

ALTER TABLE Product ADD Attributes NVARCHAR(MAX);
-- Store: {"color": "red", "size": "large"}

Pattern 2: Use Appropriate Data Types

-- Wrong: NVARCHAR(MAX) for short strings
-- Correct:
FirstName NVARCHAR(50)

-- Wrong: DATETIME for date-only
-- Correct:
BirthDate DATE

Pattern 3: Avoid SELECT *

-- Wrong: SELECT * FROM Customer
-- Correct:
SELECT CustomerID, FirstName, Email FROM Customer WHERE CustomerID = @id;

Data Integrity Enforcement

Referential Integrity

-- Cascade delete order items when order deleted
ALTER TABLE OrderItem
ADD CONSTRAINT FK_OrderItem_Order
FOREIGN KEY (OrderID) REFERENCES [Order](OrderID) ON DELETE CASCADE;

Check Constraints

ALTER TABLE [Order] ADD CONSTRAINT CK_Order_Status
CHECK (Status IN ('Pending', 'Shipped', 'Delivered', 'Cancelled'));

Unique Constraints

CREATE UNIQUE INDEX UQ_Customer_Email ON Customer(Email);

Troubleshooting Design Issues

Issue: Slow joins on large tables
Solution: Add covering indexes; review normalization vs denormalization trade-offs

Issue: Deadlocks on updates
Solution: Reduce transaction scope; use optimistic concurrency (rowversion)

Issue: Data anomalies (update/delete/insert)
Solution: Review normalization; ensure constraints enforce rules

Best Practices Summary

Normalize to 3NF by default
Denormalize strategically for read-heavy workloads
Use surrogate keys (IDENTITY/GUID) for primary keys
Index foreign keys and frequently queried columns
Enforce integrity via constraints, not application code
Plan for scale with partitioning and archival strategies

Key Takeaways

Proper normalization prevents data anomalies.
Strategic denormalization improves query performance.
Indexes and constraints balance speed and integrity.
Temporal tables simplify audit requirements.

Next Steps

Perform schema review on existing databases
Identify denormalization candidates for reporting
Implement partitioning for historical data

Additional Resources

Which schema will you optimize first?