Database Design Best Practices: Normalization and Performance
Introduction
[Explain database design impact on query performance, data integrity, and maintainability; balance normalization vs performance.]
Prerequisites
- SQL Server or equivalent RDBMS
- Understanding of relational concepts
- Sample schema for examples
Normalization Levels
| Normal Form | Rule | Purpose |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Eliminate duplicate columns |
| 2NF | 1NF + no partial dependencies | Remove redundant data |
| 3NF | 2NF + no transitive dependencies | Eliminate derived data |
| BCNF | 3NF + every determinant is a key | Strict functional dependency |
Step-by-Step Guide
Step 1: Identify Entities & Relationships
Example Domain: E-commerce
- Entities: Customer, Order, Product, OrderItem
- Relationships: Customer → Order (1:M), Order → OrderItem (1:M), Product → OrderItem (1:M)
Step 2: Design Schema (3NF)
CREATE TABLE Customer (
CustomerID INT PRIMARY KEY IDENTITY,
FirstName NVARCHAR(50) NOT NULL,
LastName NVARCHAR(50) NOT NULL,
Email NVARCHAR(100) UNIQUE NOT NULL,
CreatedDate DATETIME2 DEFAULT GETDATE()
);
CREATE TABLE Product (
ProductID INT PRIMARY KEY IDENTITY,
ProductName NVARCHAR(100) NOT NULL,
CategoryID INT NOT NULL,
UnitPrice DECIMAL(10,2) NOT NULL,
CONSTRAINT FK_Product_Category FOREIGN KEY (CategoryID) REFERENCES Category(CategoryID)
);
CREATE TABLE [Order] (
OrderID INT PRIMARY KEY IDENTITY,
CustomerID INT NOT NULL,
OrderDate DATETIME2 DEFAULT GETDATE(),
Status NVARCHAR(20) NOT NULL,
CONSTRAINT FK_Order_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);
CREATE TABLE OrderItem (
OrderItemID INT PRIMARY KEY IDENTITY,
OrderID INT NOT NULL,
ProductID INT NOT NULL,
Quantity INT NOT NULL,
UnitPrice DECIMAL(10,2) NOT NULL,
CONSTRAINT FK_OrderItem_Order FOREIGN KEY (OrderID) REFERENCES [Order](OrderID),
CONSTRAINT FK_OrderItem_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID)
);
Step 3: Apply Constraints & Defaults
ALTER TABLE Product ADD CONSTRAINT CK_Product_UnitPrice CHECK (UnitPrice >= 0);
ALTER TABLE OrderItem ADD CONSTRAINT CK_OrderItem_Quantity CHECK (Quantity > 0);
Step 4: Strategic Denormalization
Scenario: Reporting query frequently needs total order value
Normalized (Expensive):
SELECT o.OrderID, SUM(oi.Quantity * oi.UnitPrice) AS TotalAmount
FROM [Order] o
JOIN OrderItem oi ON o.OrderID = oi.OrderID
GROUP BY o.OrderID;
Denormalized (Add computed column):
ALTER TABLE [Order] ADD TotalAmount AS (
SELECT SUM(Quantity * UnitPrice)
FROM OrderItem
WHERE OrderItem.OrderID = [Order].OrderID
) PERSISTED;
Step 5: Indexing Strategy
Primary Keys (Clustered):
-- Automatically created with PRIMARY KEY
Foreign Keys (Non-Clustered):
CREATE NONCLUSTERED INDEX IX_OrderItem_OrderID ON OrderItem(OrderID);
CREATE NONCLUSTERED INDEX IX_OrderItem_ProductID ON OrderItem(ProductID);
Covering Index for Common Query:
CREATE NONCLUSTERED INDEX IX_Order_CustomerID_Status
ON [Order](CustomerID, Status)
INCLUDE (OrderDate, TotalAmount);
Step 6: Partitioning for Scale
Horizontal Partitioning (Sharding):
-- Partition by OrderDate range
CREATE PARTITION FUNCTION PF_OrderDate (DATETIME2)
AS RANGE RIGHT FOR VALUES ('2024-01-01', '2025-01-01', '2026-01-01');
CREATE PARTITION SCHEME PS_OrderDate
AS PARTITION PF_OrderDate ALL TO ([PRIMARY]);
CREATE TABLE [Order] (
OrderID INT PRIMARY KEY,
OrderDate DATETIME2,
-- other columns
) ON PS_OrderDate(OrderDate);
Step 7: Audit & History Tracking
Temporal Tables:
ALTER TABLE Customer ADD
SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN,
SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN,
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);
ALTER TABLE Customer SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CustomerHistory));
Performance Optimization Patterns
Pattern 1: Avoid EAV (Entity-Attribute-Value)
Anti-Pattern:
CREATE TABLE ProductAttribute (
ProductID INT,
AttributeName NVARCHAR(50),
AttributeValue NVARCHAR(200)
);
Better (JSON for semi-structured):
ALTER TABLE Product ADD Attributes NVARCHAR(MAX);
-- Store: {"color": "red", "size": "large"}
Pattern 2: Use Appropriate Data Types
-- Wrong: NVARCHAR(MAX) for short strings
-- Correct:
FirstName NVARCHAR(50)
-- Wrong: DATETIME for date-only
-- Correct:
BirthDate DATE
Pattern 3: Avoid SELECT *
-- Wrong: SELECT * FROM Customer
-- Correct:
SELECT CustomerID, FirstName, Email FROM Customer WHERE CustomerID = @id;
Data Integrity Enforcement
Referential Integrity
-- Cascade delete order items when order deleted
ALTER TABLE OrderItem
ADD CONSTRAINT FK_OrderItem_Order
FOREIGN KEY (OrderID) REFERENCES [Order](OrderID) ON DELETE CASCADE;
Check Constraints
ALTER TABLE [Order] ADD CONSTRAINT CK_Order_Status
CHECK (Status IN ('Pending', 'Shipped', 'Delivered', 'Cancelled'));
Unique Constraints
CREATE UNIQUE INDEX UQ_Customer_Email ON Customer(Email);
Troubleshooting Design Issues
Issue: Slow joins on large tables
Solution: Add covering indexes; review normalization vs denormalization trade-offs
Issue: Deadlocks on updates
Solution: Reduce transaction scope; use optimistic concurrency (rowversion)
Issue: Data anomalies (update/delete/insert)
Solution: Review normalization; ensure constraints enforce rules
Best Practices Summary
- Normalize to 3NF by default
- Denormalize strategically for read-heavy workloads
- Use surrogate keys (IDENTITY/GUID) for primary keys
- Index foreign keys and frequently queried columns
- Enforce integrity via constraints, not application code
- Plan for scale with partitioning and archival strategies
Key Takeaways
- Proper normalization prevents data anomalies.
- Strategic denormalization improves query performance.
- Indexes and constraints balance speed and integrity.
- Temporal tables simplify audit requirements.
Next Steps
- Perform schema review on existing databases
- Identify denormalization candidates for reporting
- Implement partitioning for historical data
Additional Resources
Which schema will you optimize first?