Power BI Data Modeling: Building Effective Data Models
Introduction
Effective data modeling is the foundation of Power BI solutions. This guide covers star schema design principles, table relationships, calculated columns and measures, optimization techniques, role-playing dimensions, slowly changing dimensions, and best practices for building scalable analytics models.
Star Schema Design
Understanding Star Schema
Star Schema Components:
1. Fact Table (center of star):
- Contains measurable metrics (sales, costs, quantities)
- Foreign keys to dimension tables
- Granular transaction data
- Large number of rows
Example: Sales Fact
- OrderID
- OrderDate (FK to Date dimension)
- CustomerID (FK to Customer dimension)
- ProductID (FK to Product dimension)
- Quantity
- UnitPrice
- TotalAmount
2. Dimension Tables (points of star):
- Descriptive attributes
- Primary keys
- Relatively fewer rows
- Used for filtering and grouping
Examples:
- Date dimension (DateKey, Year, Quarter, Month, Day)
- Customer dimension (CustomerID, Name, City, Country)
- Product dimension (ProductID, Name, Category, Subcategory)
Benefits:
- Simple to understand
- Fast query performance
- Easy to navigate
- Optimized for analytics
Creating Fact Tables
-- Load fact table from SQL
Sales =
EVALUATE
SUMMARIZECOLUMNS(
'Order'[OrderID],
'Order'[OrderDate],
'Order'[CustomerID],
'Order'[ProductID],
"Quantity", SUM('OrderDetails'[Quantity]),
"UnitPrice", AVERAGE('OrderDetails'[UnitPrice]),
"TotalAmount", SUM('OrderDetails'[Quantity] * 'OrderDetails'[UnitPrice])
)
-- Fact table characteristics:
-- ✓ Measurable metrics
-- ✓ Foreign keys only
-- ✓ No descriptive text
-- ✓ Additive measures
-- ✗ Avoid unnecessary columns
Creating Dimension Tables
-- Date dimension (calendar table)
DateDimension =
ADDCOLUMNS(
CALENDAR(DATE(2020, 1, 1), DATE(2030, 12, 31)),
"Year", YEAR([Date]),
"Quarter", "Q" & FORMAT([Date], "Q"),
"Month", FORMAT([Date], "MMM"),
"MonthNumber", MONTH([Date]),
"Day", DAY([Date]),
"DayOfWeek", FORMAT([Date], "DDD"),
"DayOfWeekNumber", WEEKDAY([Date]),
"YearMonth", FORMAT([Date], "YYYY-MM"),
"IsWeekend", WEEKDAY([Date]) IN {1, 7}
)
-- Product dimension
Product =
SELECTCOLUMNS(
'ProductTable',
"ProductID", [ProductID],
"ProductName", [Name],
"Category", [Category],
"Subcategory", [Subcategory],
"Brand", [Brand],
"Color", [Color],
"ListPrice", [ListPrice]
)
-- Customer dimension
Customer =
SELECTCOLUMNS(
'CustomerTable',
"CustomerID", [CustomerID],
"CustomerName", [FirstName] & " " & [LastName],
"Email", [Email],
"City", [City],
"State", [State],
"Country", [Country],
"Segment", [CustomerSegment]
)
Table Relationships
Creating Relationships
Relationship types:
1. One-to-Many (most common):
- Dimension (one) → Fact (many)
- Example: Product.ProductID → Sales.ProductID
- One product can have many sales
2. Many-to-One:
- Reverse of one-to-many
- Fact (many) → Dimension (one)
3. One-to-One:
- Rare in star schema
- Used for splitting large tables
4. Many-to-Many:
- Avoid in star schema
- Use bridge tables instead
Creating relationships in Power BI Desktop:
1. Model view
2. Drag from dimension key to fact foreign key
3. Verify cardinality: 1:* (one-to-many)
4. Cross filter direction: Single (dimension filters fact)
Relationship Properties
Key properties:
1. Cardinality:
- One-to-many (1:*)
- Many-to-one (*:1)
- One-to-one (1:1)
- Many-to-many (*:*)
2. Cross filter direction:
- Single: Dimension filters fact (standard)
- Both: Bidirectional filtering (use carefully)
3. Make this relationship active:
- Active: Used in calculations
- Inactive: Used with USERELATIONSHIP()
4. Assume referential integrity:
- Improves performance
- Use when all fact keys exist in dimension
Role-Playing Dimensions
Role-playing dimension: Same dimension used multiple times
Example: Date dimension with multiple roles
- OrderDate
- ShipDate
- DueDate
Implementation:
1. Physical tables (not recommended):
- Duplicate Date table 3 times
- Creates model bloat
2. Multiple relationships (recommended):
- One Date table
- Multiple relationships to Sales:
* Sales[OrderDate] → Date[Date] (active)
* Sales[ShipDate] → Date[Date] (inactive)
* Sales[DueDate] → Date[Date] (inactive)
Usage in DAX:
CALCULATE(
[Total Sales],
USERELATIONSHIP(Sales[ShipDate], 'Date'[Date])
)
Calculated Columns vs Measures
Calculated Columns
-- Calculated column: Computed row-by-row at refresh
-- Full Name
Customer[Full Name] =
Customer[FirstName] & " " & Customer[LastName]
-- Age Group
Customer[Age Group] =
SWITCH(
TRUE(),
Customer[Age] < 18, "Under 18",
Customer[Age] < 35, "18-34",
Customer[Age] < 55, "35-54",
"55+"
)
-- Profit
Sales[Profit] = Sales[Revenue] - Sales[Cost]
-- Year-Month
Sales[YearMonth] = FORMAT(Sales[OrderDate], "YYYY-MM")
When to use:
✓ Needed for filtering/slicing
✓ Rarely changes
✓ Row context calculations
✗ Avoid for aggregations (use measures)
Measures
-- Measure: Computed dynamically based on filter context
-- Total Sales
Total Sales = SUM(Sales[Amount])
-- Total Quantity
Total Quantity = SUM(Sales[Quantity])
-- Average Sale
Average Sale = DIVIDE([Total Sales], [Total Quantity], 0)
-- Sales YoY Growth %
Sales YoY Growth % =
VAR CurrentYear = [Total Sales]
VAR PreviousYear =
CALCULATE(
[Total Sales],
DATEADD('Date'[Date], -1, YEAR)
)
RETURN
DIVIDE(CurrentYear - PreviousYear, PreviousYear, 0)
-- Running Total
Running Total =
CALCULATE(
[Total Sales],
FILTER(
ALLSELECTED('Date'),
'Date'[Date] <= MAX('Date'[Date])
)
)
When to use:
✓ Aggregations (SUM, AVG, COUNT)
✓ Dynamic calculations
✓ Context-dependent results
✓ Better performance for large datasets
DAX Fundamentals
Filter Context
-- Filter context: Filters applied to the model
-- Example report:
-- Visual: Matrix
-- Rows: Product[Category]
-- Values: [Total Sales]
-- For each row, filter context is:
-- Category = "Electronics" → Total Sales for Electronics
-- Category = "Clothing" → Total Sales for Clothing
-- Modify filter context with CALCULATE:
Sales All Categories =
CALCULATE(
[Total Sales],
ALL(Product[Category]) -- Remove category filter
)
Sales for Electronics =
CALCULATE(
[Total Sales],
Product[Category] = "Electronics" -- Apply specific filter
)
Row Context
-- Row context: Current row in table iteration
-- Calculated column has row context:
Sales[Line Total] = Sales[Quantity] * Sales[UnitPrice]
-- Iterate functions create row context:
Total Freight =
SUMX(
Sales, -- Iterate over Sales table
Sales[Quantity] * Sales[FreightPerUnit] -- Row context
)
-- Convert row context to filter context with CALCULATE:
Product Total Sales =
CALCULATE(
[Total Sales] -- Uses filter context
)
-- In calculated column, CALCULATE converts current row to filter
Time Intelligence
-- Requires proper Date table with Date type column
-- Year-to-Date
YTD Sales = TOTALYTD([Total Sales], 'Date'[Date])
-- Month-to-Date
MTD Sales = TOTALMTD([Total Sales], 'Date'[Date])
-- Quarter-to-Date
QTD Sales = TOTALQTD([Total Sales], 'Date'[Date])
-- Previous Year
PY Sales = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date]))
-- Year-over-Year Growth
YoY Growth = [Total Sales] - [PY Sales]
YoY Growth % = DIVIDE([YoY Growth], [PY Sales], 0)
-- Moving Average (3 months)
Sales 3M MA =
CALCULATE(
[Total Sales],
DATESINPERIOD(
'Date'[Date],
LASTDATE('Date'[Date]),
-3,
MONTH
)
) / 3
-- Custom fiscal year (starts July)
FY Sales =
CALCULATE(
[Total Sales],
DATESYTD('Date'[Date], "06-30")
)
Data Model Optimization
Reducing Model Size
1. Remove unnecessary columns:
- Delete unused columns
- Don't import everything from source
2. Change data types:
- Use smallest appropriate type
- Text: 50 chars vs unlimited
- Integer: 32-bit vs 64-bit
- Decimal: Fixed vs floating
3. Disable auto date/time:
- File → Options → Data Load
- Uncheck "Auto date/time"
- Create explicit date table
4. Use summarized tables:
- Pre-aggregate at source
- Example: Daily instead of transaction-level
5. Split date and time:
- Store date and time in separate columns
- Better compression
Column Statistics
View column statistics:
View → Column Quality
View → Column Distribution
View → Column Profile
Metrics:
- Cardinality: Unique values
- Distinct count
- Empty values
- Error count
Low cardinality = better compression
High cardinality = consider partitioning
Relationships Best Practices
1. Use integer keys:
- Better performance than text keys
- Surrogate keys recommended
2. Single direction filtering:
- Avoid bidirectional unless necessary
- Can cause ambiguity
3. Active relationships:
- One active path between tables
- Inactive for role-playing dimensions
4. Avoid many-to-many:
- Use bridge tables
- Better performance
Advanced Patterns
Slowly Changing Dimensions (SCD)
-- Type 1 SCD: Overwrite old values
-- Simply update dimension table
-- History is lost
-- Type 2 SCD: Preserve history
-- Add versioning columns to dimension:
Customer_SCD2 =
ADDCOLUMNS(
Customer,
"ValidFrom", [EffectiveDate],
"ValidTo", [ExpirationDate],
"IsCurrent", [ExpirationDate] = DATE(9999, 12, 31)
)
-- Filter for current records only
Current Customers =
FILTER(
Customer_SCD2,
Customer_SCD2[IsCurrent] = TRUE
)
-- Point-in-time query
Customers At Date =
FILTER(
Customer_SCD2,
AND(
Customer_SCD2[ValidFrom] <= [Selected Date],
Customer_SCD2[ValidTo] > [Selected Date]
)
)
Parent-Child Hierarchies
-- Employee table with Manager relationship
-- Calculate hierarchy level
Level =
PATHLENGTH(Employee[Path])
-- Get manager name
Manager Name =
LOOKUPVALUE(
Employee[Name],
Employee[EmployeeID],
Employee[ManagerID]
)
-- Aggregate up hierarchy
Total Sales Including Subordinates =
CALCULATE(
[Total Sales],
TREATAS(
PATHCONTAINS(
Employee[Path],
Employee[EmployeeID]
),
Employee[EmployeeID]
)
)
Many-to-Many with Bridge Table
Scenario: Products can belong to multiple categories
Tables:
- Product (ProductID, ProductName)
- Category (CategoryID, CategoryName)
- ProductCategory bridge (ProductID, CategoryID)
Relationships:
Product → ProductCategory (many-to-one)
Category → ProductCategory (many-to-one)
Enable many-to-many:
Set relationship cross filter direction to Both
Composite Models and Dual Storage Mode
Understanding Storage Modes
Power BI supports three storage modes per table:
1. Import Mode (Default)
- Data loaded into Power BI model
- Compressed using VertiPaq engine
- Fast query performance
- Requires refresh to update
- Limited by memory capacity
Use Cases:
✓ Historical analysis
✓ Complex calculations
✓ Moderate data volumes (<10GB compressed)
✓ Fast visual interactions required
2. DirectQuery Mode
- Data queried from source in real-time
- No data import, minimal memory usage
- Always up-to-date
- Query performance depends on source
- Limited DAX functionality
Use Cases:
✓ Real-time data requirements
✓ Very large datasets (>100GB)
✓ Data sovereignty/compliance needs
✓ Source has good query performance
3. Dual Mode (Hybrid)
- Table acts as Import or DirectQuery contextually
- Power BI chooses optimal mode per query
- Best of both worlds
Use Cases:
✓ Dimension tables in large DirectQuery models
✓ Improve performance while maintaining real-time fact data
Creating Composite Models
Scenario: Sales fact table (DirectQuery) + Dimension tables (Import)
Steps:
1. Connect to SQL with DirectQuery
2. Load Sales fact table (DirectQuery mode)
3. Add dimension tables:
- Home → Get Data → Other sources
- Select dimension tables
- Load as Import mode
4. Create relationships between Import and DirectQuery tables
5. Set dimension tables to Dual mode for optimal performance
Benefits:
- Real-time sales data (DirectQuery)
- Fast dimension filtering (Import/Dual)
- Reduced source query load
- Flexible refresh schedule for dimensions
Aggregation Tables
Aggregations dramatically improve large DirectQuery model performance by pre-calculating summaries.
Creating Aggregations:
// Original detail table (DirectQuery):
// Sales: 1B rows (OrderID, ProductID, OrderDate, CustomerID, Amount)
// Create aggregation table (Import):
Sales_Agg_Daily =
SUMMARIZECOLUMNS(
Sales[OrderDate],
Sales[ProductID],
"Total Amount", SUM(Sales[Amount]),
"Total Quantity", SUM(Sales[Quantity]),
"Order Count", DISTINCTCOUNT(Sales[OrderID])
)
// Configure aggregation in Power BI:
// 1. Right-click Sales_Agg_Daily → Manage aggregations
// 2. Map columns:
// - OrderDate → Sales[OrderDate] (GroupBy)
// - ProductID → Sales[ProductID] (GroupBy)
// - Total Amount → Sales[Amount] (Sum)
// - Total Quantity → Sales[Quantity] (Sum)
// - Order Count → Sales[OrderID] (Count)
// 3. Set detail table to Sales
// 4. Save
// Power BI automatically uses aggregation when possible
// Query: "Total sales by date" → Uses Sales_Agg_Daily (fast)
// Query: "Total sales by customer" → Uses Sales detail (slower)
Aggregation Best Practices:
| Grain | Use Case | Performance Gain |
|---|---|---|
| Daily × Product | Sales by date/product | 100-1000× faster |
| Monthly × Category | Trend analysis | 1000-10000× faster |
| Yearly × Region | Executive dashboards | 10000+× faster |
Guidelines:
✓ Create multiple aggregation levels (day, month, year)
✓ Include most-queried dimension combinations
✓ Keep aggregation tables in Import mode
✓ Detail table stays in DirectQuery
✗ Don't aggregate to single row (defeats purpose)
✗ Don't include low-cardinality dimensions only
DirectQuery Optimization
Query Reduction Settings
File → Options → Data Load
Query reduction settings:
☑ Reduce number of queries sent by slicers
- Apply button required before query executes
☑ Apply button for filters and slicers
- User must explicitly apply selections
☑ Reduce number of queries sent by visuals
- Cross-highlighting doesn't query until complete
Native SQL Query Monitoring
Performance Analyzer → Start Recording → Interact with visual
Review "Direct Query" section:
- Query: Shows SQL sent to source
- Duration: Time for source to return data
Optimization targets:
- < 1 second: Excellent
- 1-3 seconds: Acceptable
- > 3 seconds: Needs optimization (add indexes, partitioning, views)
DirectQuery Best Practices
// ❌ BAD: Complex calculated columns in DirectQuery
// These get recalculated per query
Sales[Profit] = Sales[Amount] - Sales[Cost] // Calculated column (avoid in DQ)
// ✅ GOOD: Pre-calculate in source
// Create view or computed column in SQL Server:
CREATE VIEW Sales_With_Profit AS
SELECT
OrderID,
Amount,
Cost,
(Amount - Cost) AS Profit
FROM Sales;
// ❌ BAD: Row-level security with complex logic in DirectQuery
[RLS Rule] =
CALCULATE(
DISTINCTCOUNT(Customer[CustomerID]),
FILTER(Customer, Customer[Region] = USERPRINCIPALNAME())
) > 0
// ✅ GOOD: Implement RLS in source database
-- SQL Server row-level security:
CREATE SECURITY POLICY CustomerFilter
ADD FILTER PREDICATE dbo.fn_securitypredicate(UserName)
ON dbo.Customer
WITH (STATE = ON);
Hybrid Table Patterns
Scenario: Recent data (hot) + Historical data (cold)
Implementation:
1. Partition table in source:
- Sales_Hot: Last 3 months (DirectQuery)
- Sales_Historical: Older data (Import)
2. Create union view in Power BI:
Sales =
UNION(
Sales_Hot, // DirectQuery (real-time)
Sales_Historical // Import (fast performance)
)
3. Benefits:
- Recent data always current
- Historical analysis performant
- Reduced source load
- Flexible refresh schedule (historical: weekly; hot: real-time)
Advanced Relationship Patterns
Bidirectional Filtering
Use with Caution:
Standard: Single direction (Dimension → Fact)
Product → Sales: Product filters Sales (one-to-many)
Bidirectional: Both directions
Product ↔ Sales: Product filters Sales AND Sales filters Product
When to use bidirectional:
✓ Many-to-many relationships
✓ Bridging tables
✓ Specific advanced scenarios
Risks:
✗ Ambiguous filter propagation
✗ Performance degradation
✗ Unexpected results in complex models
Example Scenario:
Tables: Sales ↔ Budget (many-to-many)
Goal: Show products with sales but no budget
Solution:
1. Create bridge table: ProductBridge (ProductID)
2. Relationships:
- Product → ProductBridge (one-to-many, single)
- Sales → ProductBridge (many-to-one, single)
- Budget → ProductBridge (many-to-one, single)
3. Enable bidirectional on ProductBridge → Product
Result: Sales and Budget cross-filter through bridge
Inactive Relationships with USERELATIONSHIP
// Date dimension with multiple roles
// Active relationship: Sales[OrderDate] → Date[Date]
Sales by Order Date = SUM(Sales[Amount])
// Inactive: Sales[ShipDate] → Date[Date]
Sales by Ship Date =
CALCULATE(
SUM(Sales[Amount]),
USERELATIONSHIP(Sales[ShipDate], 'Date'[Date])
)
// Inactive: Sales[DueDate] → Date[Date]
Sales by Due Date =
CALCULATE(
SUM(Sales[Amount]),
USERELATIONSHIP(Sales[DueDate], 'Date'[Date])
)
// Dynamic selection with parameter
Sales by Selected Date =
VAR SelectedRole = SELECTEDVALUE('DateRole'[Role], "OrderDate")
RETURN
SWITCH(
SelectedRole,
"OrderDate", [Sales by Order Date],
"ShipDate", [Sales by Ship Date],
"DueDate", [Sales by Due Date],
BLANK()
)
Circular Dependency Resolution
Problem: A → B → C → A (circular relationship)
Example:
- Employee → Manager (Employee table self-reference)
- Sales → Product → Category → SubcategoryBridge → Sales
Solutions:
1. Flatten hierarchy (best for small hierarchies):
- Denormalize Category into Product table
- Single relationship: Sales → Product
2. Use calculated table to break cycle:
- Keep relationships: Sales → Product → Category
- Create: CategoryList = VALUES(Category)
- Use CategoryList for slicing (no relationship)
3. DAX-based filtering (for complex scenarios):
- Remove problematic relationship
- Use TREATAS or CROSSFILTER in measures
Data Model Optimization Techniques
Column Optimization
Reduce Cardinality:
1. Remove high-cardinality text columns:
❌ Keep: Customer[Email] (5M unique values)
✅ Remove: Use CustomerID for relationships only
2. Convert to low-cardinality:
❌ Sales[Comments] (free text, millions of unique values)
✅ Sales[Has Comments] (Boolean: TRUE/FALSE)
3. Optimize data types:
❌ Text: "2025-01-27" (larger storage)
✅ Date: 2025-01-27 (optimized storage)
❌ Decimal: 123.4567890 (8 bytes)
✅ Whole Number: 1234 (4 bytes) if decimals unnecessary
4. Use integer keys instead of text:
❌ Product[SKU] = "PROD-12345-XYZ" (text)
✅ Product[ProductID] = 12345 (integer, relationship key)
Model Compression Statistics
View model size:
External Tools → DAX Studio → View → DMV Browser → $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS
Key metrics:
- Table Size: MB per table
- Column Cardinality: Unique values
- Dictionary Size: Compressed size
- Data Type: Optimization opportunities
Target reductions:
- Remove unused columns: 10-50% size reduction
- Optimize data types: 20-40% size reduction
- Reduce cardinality: 30-60% size reduction
- Remove calculated columns: 10-30% size reduction (replace with measures)
Incremental Refresh Configuration
Scenario: 10-year historical Sales table, daily updates
Configuration:
1. Define parameters (Power Query):
- RangeStart (DateTime)
- RangeEnd (DateTime)
2. Filter table using parameters:
Sales = Table.SelectRows(Source, each
[OrderDate] >= RangeStart and [OrderDate] < RangeEnd)
3. Configure incremental refresh policy (Model view):
- Right-click Sales table → Incremental refresh
- Archive data starting: 9 years before refresh date
- Incrementally refresh data starting: 7 days before refresh date
- Detect data changes: [OrderDate]
4. Benefits:
- Full refresh only needed for last 7 days (fast)
- Historical data (9+ years) never refreshed (stable)
- Automatic partition management
- 100× faster refresh times
Data Model Versioning and Governance
Model Documentation
// Table-level documentation
// Right-click table → Properties → Description
Sales_Description =
"Fact table: Sales transactions
Source: SQL Server - AdventureWorks.dbo.Sales
Refresh: Daily at 2 AM UTC
Grain: One row per order line item
Last Modified: 2025-01-27 by Vladimir Luis"
// Measure-level documentation
Total Sales = SUM(Sales[Amount])
// Description: "Sum of all sales amounts. Includes discounts. Excludes returns (use [Net Sales] for returns-adjusted)."
// Relationship documentation (in model diagram)
// Add text boxes with notes:
"⚠️ OrderDate is active relationship
ShipDate and DueDate are inactive - use USERELATIONSHIP() in measures"
Change Tracking
Version control for Power BI models:
1. Save as PBIX + PBIT (template):
- PBIX: Full model with data
- PBIT: Model structure only (no data)
- Commit PBIT to Git for version control
2. Use external tools:
- Tabular Editor 2/3: Extract model as JSON
- Commit JSON to Git
- Track changes with diff tools
3. Document changes:
- Changelog table in model
- Version number in report title
- Release notes page
Changelog = #table(
{"Version", "Date", "Changes", "Author"},
{
{"2.1", #date(2025,1,27), "Added sales aggregations", "Vladimir"},
{"2.0", #date(2025,1,15), "Implemented composite model", "Vladimir"},
{"1.5", #date(2024,12,10), "Added DirectQuery for real-time data", "Team"}
}
)
Model Health Checks
// Create "Model Health" page with these measures:
Tables Count = COUNTROWS(INFO.TABLES())
Relationships Count = COUNTROWS(INFO.RELATIONSHIPS())
Measures Count = COUNTROWS(INFO.MEASURES())
Calculated Columns Count = COUNTROWS(INFO.COLUMNS("CALCULATED_COLUMN"))
Unused Columns =
VAR AllColumns = INFO.COLUMNS()
VAR UsedInVisuals = INFO.COLUMNS("USED_IN_VISUALS")
RETURN COUNTROWS(EXCEPT(AllColumns, UsedInVisuals))
Model Size MB =
SUMX(
INFO.STORAGE_TABLE_COLUMNS(),
[DICTIONARY_SIZE] + [DATA_SIZE]
) / 1024 / 1024
Refresh Duration =
"Last refresh: " & FORMAT([Last Refresh Time], "YYYY-MM-DD HH:mm")
// Alert: Tables without relationships
Orphaned Tables =
VAR AllTables = VALUES(INFO.TABLES[TABLE_NAME])
VAR TablesInRelationships =
UNION(
VALUES(INFO.RELATIONSHIPS[FROM_TABLE]),
VALUES(INFO.RELATIONSHIPS[TO_TABLE])
)
RETURN COUNTROWS(EXCEPT(AllTables, TablesInRelationships))
Performance Optimization
-- Bad: Multiple CALCULATE calls
Slow Measure =
CALCULATE([Total Sales], Product[Category] = "A") +
CALCULATE([Total Sales], Product[Category] = "B") +
CALCULATE([Total Sales], Product[Category] = "C")
-- Good: Single CALCULATE with filter
Fast Measure =
CALCULATE(
[Total Sales],
Product[Category] IN {"A", "B", "C"}
)
-- Bad: Nested CALCULATE
Slow =
CALCULATE(
CALCULATE(
[Total Sales],
Product[Category] = "A"
),
'Date'[Year] = 2025
)
-- Good: Combined filters
Fast =
CALCULATE(
[Total Sales],
Product[Category] = "A",
'Date'[Year] = 2025
)
-- Use variables for repeated calculations
Optimized Measure =
VAR TotalSales = [Total Sales]
VAR TotalCost = [Total Cost]
VAR Profit = TotalSales - TotalCost
VAR Margin = DIVIDE(Profit, TotalSales, 0)
RETURN
IF(Margin > 0.2, "High", "Low")
Data Model Design Patterns and Anti-Patterns
✅ Design Patterns (Best Practices)
1. Conformed Dimensions
Pattern: Share dimension tables across multiple fact tables
Example:
- Sales fact → Date dimension
- Budget fact → Date dimension
- Inventory fact → Date dimension
(All use same Date dimension)
Benefits:
✓ Consistent filtering across facts
✓ Simplified model maintenance
✓ Reduced model size
✓ Easier user experience
Implementation:
- Create single Date dimension
- Establish relationships from all facts
- Use in slicers for cross-fact filtering
2. Junk Dimensions
Pattern: Combine low-cardinality flags into single dimension
Instead of:
- Sales[IsOnline] (Boolean)
- Sales[IsPriority] (Boolean)
- Sales[IsInternational] (Boolean)
- Sales[PaymentMethod] (3 values)
Create:
OrderAttributes dimension:
- AttributeID (1-24, all combinations)
- IsOnline (Y/N)
- IsPriority (Y/N)
- IsInternational (Y/N)
- PaymentMethod (Cash/Card/Online)
Benefits:
✓ Reduces fact table width
✓ Improves compression
✓ Easier to add new flags
✓ Better performance
3. Surrogate Keys
Pattern: Integer keys instead of natural keys
❌ Natural Key: Product[SKU] = "PROD-ELECT-12345-XYZ"
✅ Surrogate Key: Product[ProductID] = 12345
Benefits:
✓ Faster joins (integer vs text)
✓ Smaller storage
✓ Handles source key changes
✓ Better compression
Implementation:
-- Power Query: Add Index Column
= Table.AddIndexColumn(Source, "ProductID", 1, 1)
4. Snowflake to Star Conversion
Snowflake (normalized):
Sales → Product → Category → SubCategory (3 hops)
Star (denormalized):
Sales → Product (with Category, SubCategory columns)
Conversion in Power Query:
Product_Flattened =
Table.ExpandTableColumn(
Table.NestedJoin(Product, "CategoryID", Category, "CategoryID", "Category"),
"Category", {"CategoryName", "SubCategoryName"}
)
Benefits:
✓ Faster queries (fewer joins)
✓ Simplified relationships
✓ Better for business users
5. Fact Table Consolidation
Multiple fact tables with same grain → Single fact table with type dimension
Instead of:
- Sales_Online (OrderID, Amount, Quantity)
- Sales_Store (OrderID, Amount, Quantity)
- Sales_Phone (OrderID, Amount, Quantity)
Create:
- Sales (OrderID, Amount, Quantity, ChannelID)
- Channel (ChannelID, ChannelName: Online/Store/Phone)
Benefits:
✓ Simpler model
✓ Unified analysis
✓ Easier maintenance
✓ Single source of truth
Implementation:
Combined_Sales = Table.Combine({
Table.AddColumn(Sales_Online, "Channel", each "Online"),
Table.AddColumn(Sales_Store, "Channel", each "Store"),
Table.AddColumn(Sales_Phone, "Channel", each "Phone")
})
❌ Anti-Patterns (Common Mistakes)
1. Multiple Fact Tables at Different Grains
❌ WRONG:
- Sales (OrderID, ProductID, DailyTotal) -- Order-level
- Inventory (ProductID, Date, Quantity) -- Daily level
→ Cannot reliably join or compare
✅ CORRECT:
- Bring both to common grain (Product × Date)
- OR create separate reports for each grain
- OR use calculation groups for different perspectives
2. Bidirectional Relationships Everywhere
❌ WRONG:
Product ↔ Sales ↔ Customer ↔ Region (all bidirectional)
→ Ambiguous filter paths, unexpected results, performance issues
✅ CORRECT:
Product → Sales ← Customer ← Region (single direction)
→ Clear filter propagation, predictable results
3. Many Calculated Columns
❌ WRONG:
Sales[Profit] = Sales[Revenue] - Sales[Cost] (Calculated column)
Sales[Margin] = DIVIDE(Sales[Profit], Sales[Revenue], 0) (Calculated column)
→ Increases model size, slower refresh, wastes memory
✅ CORRECT:
Profit = SUM(Sales[Revenue]) - SUM(Sales[Cost]) (Measure)
Margin = DIVIDE([Profit], SUM(Sales[Revenue]), 0) (Measure)
→ Calculated on-demand, no storage, flexible aggregation
4. Import Mode for Real-Time Data
❌ WRONG:
Importing live IoT sensor data (updates every second)
→ Constant refresh load, data never current, infrastructure strain
✅ CORRECT:
DirectQuery for real-time data
Import for historical analysis
Composite model combining both
5. No Date Table
❌ WRONG:
Using Sales[OrderDate] directly for time intelligence
→ Time intelligence functions fail, no fiscal calendar support
✅ CORRECT:
Create proper date dimension with contiguous dates
Mark as date table
Use for all time intelligence
6. Circular Relationships
❌ WRONG:
Employee → Manager → Department → Employee (circular)
→ Model won't load, calculations fail
✅ CORRECT:
Break cycle: Remove one relationship
Use DAX (USERELATIONSHIP, TREATAS) for complex scenarios
OR denormalize to eliminate circular dependency
Decision Framework
When to Import vs DirectQuery:
| Factor | Import | DirectQuery |
|---|---|---|
| Data size | < 10GB compressed | > 10GB or very large |
| Refresh frequency | Daily or less frequent | Real-time required |
| Query performance need | Critical (sub-second) | Acceptable (1-5 seconds) |
| Data sovereignty | Flexible | Must stay in source |
| DAX complexity | Full DAX support | Limited DAX support |
| Network reliability | Not critical | Must be reliable |
When to Use Calculated Column vs Measure:
| Use Calculated Column When: | Use Measure When: |
|---|---|
| Need to filter/slice by result | Aggregating data (SUM, AVG, COUNT) |
| Value is static (e.g., Full Name) | Result changes with filter context |
| Required in relationships | Performance is critical |
| Low cardinality result | Dynamic business logic |
When to Use Bidirectional Relationships:
✅ Use bidirectional when:
- Many-to-many with bridge table
- Specific measure requires it
- Model is small and performance acceptable
❌ Avoid bidirectional when:
- Standard star schema (use single direction)
- Performance is critical
- Model has many tables (complexity)
- Can achieve same result with DAX
Key Takeaways
- Star schema is the foundation: Fact tables (measures) + dimension tables (attributes) = optimal analytics model
- Relationships drive filtering: One-to-many from dimensions to facts; understand cardinality and filter direction
- Storage mode matters: Import for performance, DirectQuery for real-time, Composite for best of both worlds
- Calculated columns vs measures: Columns computed at refresh (static), measures at query time (dynamic) - prefer measures
- Date dimension is mandatory: Contiguous date table marked as date table enables time intelligence functions
- Aggregations accelerate DirectQuery: Pre-calculate summaries for 100-1000× performance gains on large datasets
- Multiple relationships via USERELATIONSHIP: Role-playing dimensions (OrderDate, ShipDate) with inactive relationships
- Model optimization reduces size: Remove unused columns, optimize data types, use integer keys, reduce cardinality
- Incremental refresh for large tables: Partition by date, refresh recent data only, archive historical data
- Documentation prevents confusion: Comment tables/measures, track versions, create model health dashboards
- Design patterns prevent rework: Conformed dimensions, junk dimensions, surrogate keys, fact consolidation
- Anti-patterns cause problems: Avoid many calculated columns, bidirectional relationships everywhere, circular dependencies
- Composite models enable hybrid scenarios: Real-time facts (DirectQuery) + performant dimensions (Import/Dual)
- Performance profiling is essential: Use Performance Analyzer and DAX Studio to identify and fix bottlenecks
- Filter context understanding is critical: Visuals create filter context; relationships propagate it; CALCULATE modifies it
Next Steps
- Design star schema for your data
- Create date dimension table
- Establish table relationships
- Build core measures library
- Implement time intelligence
- Optimize model size
- Test query performance
- Document data model
Additional Resources
Model. Relate. Calculate. Visualize.