Database Design Best Practices: Normalization and Performance

Database Design Best Practices: Normalization and Performance

Introduction

[Explain database design impact on query performance, data integrity, and maintainability; balance normalization vs performance.]

Prerequisites

  • SQL Server or equivalent RDBMS
  • Understanding of relational concepts
  • Sample schema for examples

Normalization Levels

Normal Form Rule Purpose
1NF Atomic values, no repeating groups Eliminate duplicate columns
2NF 1NF + no partial dependencies Remove redundant data
3NF 2NF + no transitive dependencies Eliminate derived data
BCNF 3NF + every determinant is a key Strict functional dependency

Step-by-Step Guide

Step 1: Identify Entities & Relationships

Example Domain: E-commerce

  • Entities: Customer, Order, Product, OrderItem
  • Relationships: Customer → Order (1:M), Order → OrderItem (1:M), Product → OrderItem (1:M)

Step 2: Design Schema (3NF)

CREATE TABLE Customer (
    CustomerID INT PRIMARY KEY IDENTITY,
    FirstName NVARCHAR(50) NOT NULL,
    LastName NVARCHAR(50) NOT NULL,
    Email NVARCHAR(100) UNIQUE NOT NULL,
    CreatedDate DATETIME2 DEFAULT GETDATE()
);

CREATE TABLE Product (
    ProductID INT PRIMARY KEY IDENTITY,
    ProductName NVARCHAR(100) NOT NULL,
    CategoryID INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    CONSTRAINT FK_Product_Category FOREIGN KEY (CategoryID) REFERENCES Category(CategoryID)
);

CREATE TABLE [Order] (
    OrderID INT PRIMARY KEY IDENTITY,
    CustomerID INT NOT NULL,
    OrderDate DATETIME2 DEFAULT GETDATE(),
    Status NVARCHAR(20) NOT NULL,
    CONSTRAINT FK_Order_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);

CREATE TABLE OrderItem (
    OrderItemID INT PRIMARY KEY IDENTITY,
    OrderID INT NOT NULL,
    ProductID INT NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    CONSTRAINT FK_OrderItem_Order FOREIGN KEY (OrderID) REFERENCES [Order](OrderID),
    CONSTRAINT FK_OrderItem_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID)
);

Step 3: Apply Constraints & Defaults

ALTER TABLE Product ADD CONSTRAINT CK_Product_UnitPrice CHECK (UnitPrice >= 0);
ALTER TABLE OrderItem ADD CONSTRAINT CK_OrderItem_Quantity CHECK (Quantity > 0);

Step 4: Strategic Denormalization

Scenario: Reporting query frequently needs total order value

Normalized (Expensive):

SELECT o.OrderID, SUM(oi.Quantity * oi.UnitPrice) AS TotalAmount
FROM [Order] o
JOIN OrderItem oi ON o.OrderID = oi.OrderID
GROUP BY o.OrderID;

Denormalized (Add computed column):

ALTER TABLE [Order] ADD TotalAmount AS (
    SELECT SUM(Quantity * UnitPrice)
    FROM OrderItem
    WHERE OrderItem.OrderID = [Order].OrderID
) PERSISTED;

Step 5: Indexing Strategy

Primary Keys (Clustered):

-- Automatically created with PRIMARY KEY

Foreign Keys (Non-Clustered):

CREATE NONCLUSTERED INDEX IX_OrderItem_OrderID ON OrderItem(OrderID);
CREATE NONCLUSTERED INDEX IX_OrderItem_ProductID ON OrderItem(ProductID);

Covering Index for Common Query:

CREATE NONCLUSTERED INDEX IX_Order_CustomerID_Status
ON [Order](CustomerID, Status)
INCLUDE (OrderDate, TotalAmount);

Step 6: Partitioning for Scale

Horizontal Partitioning (Sharding):

-- Partition by OrderDate range
CREATE PARTITION FUNCTION PF_OrderDate (DATETIME2)
AS RANGE RIGHT FOR VALUES ('2024-01-01', '2025-01-01', '2026-01-01');

CREATE PARTITION SCHEME PS_OrderDate
AS PARTITION PF_OrderDate ALL TO ([PRIMARY]);

CREATE TABLE [Order] (
    OrderID INT PRIMARY KEY,
    OrderDate DATETIME2,
    -- other columns
) ON PS_OrderDate(OrderDate);

Step 7: Audit & History Tracking

Temporal Tables:

ALTER TABLE Customer ADD
    SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN,
    SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN,
    PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);

ALTER TABLE Customer SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CustomerHistory));

Performance Optimization Patterns

Pattern 1: Avoid EAV (Entity-Attribute-Value)

Anti-Pattern:

CREATE TABLE ProductAttribute (
    ProductID INT,
    AttributeName NVARCHAR(50),
    AttributeValue NVARCHAR(200)
);

Better (JSON for semi-structured):

ALTER TABLE Product ADD Attributes NVARCHAR(MAX);
-- Store: {"color": "red", "size": "large"}

Pattern 2: Use Appropriate Data Types

-- Wrong: NVARCHAR(MAX) for short strings
-- Correct:
FirstName NVARCHAR(50)

-- Wrong: DATETIME for date-only
-- Correct:
BirthDate DATE

Pattern 3: Avoid SELECT *

-- Wrong: SELECT * FROM Customer
-- Correct:
SELECT CustomerID, FirstName, Email FROM Customer WHERE CustomerID = @id;

Data Integrity Enforcement

Referential Integrity

-- Cascade delete order items when order deleted
ALTER TABLE OrderItem
ADD CONSTRAINT FK_OrderItem_Order
FOREIGN KEY (OrderID) REFERENCES [Order](OrderID) ON DELETE CASCADE;

Check Constraints

ALTER TABLE [Order] ADD CONSTRAINT CK_Order_Status
CHECK (Status IN ('Pending', 'Shipped', 'Delivered', 'Cancelled'));

Unique Constraints

CREATE UNIQUE INDEX UQ_Customer_Email ON Customer(Email);

Troubleshooting Design Issues

Issue: Slow joins on large tables
Solution: Add covering indexes; review normalization vs denormalization trade-offs

Issue: Deadlocks on updates
Solution: Reduce transaction scope; use optimistic concurrency (rowversion)

Issue: Data anomalies (update/delete/insert)
Solution: Review normalization; ensure constraints enforce rules

Best Practices Summary

  • Normalize to 3NF by default
  • Denormalize strategically for read-heavy workloads
  • Use surrogate keys (IDENTITY/GUID) for primary keys
  • Index foreign keys and frequently queried columns
  • Enforce integrity via constraints, not application code
  • Plan for scale with partitioning and archival strategies

Key Takeaways

  • Proper normalization prevents data anomalies.
  • Strategic denormalization improves query performance.
  • Indexes and constraints balance speed and integrity.
  • Temporal tables simplify audit requirements.

Next Steps

  • Perform schema review on existing databases
  • Identify denormalization candidates for reporting
  • Implement partitioning for historical data

Additional Resources


Which schema will you optimize first?