Power BI Data Modeling: Building Effective Data Models

Introduction

Effective data modeling is the foundation of Power BI solutions. This guide covers star schema design principles, table relationships, calculated columns and measures, optimization techniques, role-playing dimensions, slowly changing dimensions, and best practices for building scalable analytics models.

Star Schema Design

Understanding Star Schema

Star Schema Components:

1. Fact Table (center of star):
   - Contains measurable metrics (sales, costs, quantities)
   - Foreign keys to dimension tables
   - Granular transaction data
   - Large number of rows

   Example: Sales Fact
   - OrderID
   - OrderDate (FK to Date dimension)
   - CustomerID (FK to Customer dimension)
   - ProductID (FK to Product dimension)
   - Quantity
   - UnitPrice
   - TotalAmount

2. Dimension Tables (points of star):
   - Descriptive attributes
   - Primary keys
   - Relatively fewer rows
   - Used for filtering and grouping

   Examples:
   - Date dimension (DateKey, Year, Quarter, Month, Day)
   - Customer dimension (CustomerID, Name, City, Country)
   - Product dimension (ProductID, Name, Category, Subcategory)

Benefits:
- Simple to understand
- Fast query performance
- Easy to navigate
- Optimized for analytics

Creating Fact Tables

-- Load fact table from SQL
Sales = 
EVALUATE
SUMMARIZECOLUMNS(
    'Order'[OrderID],
    'Order'[OrderDate],
    'Order'[CustomerID],
    'Order'[ProductID],
    "Quantity", SUM('OrderDetails'[Quantity]),
    "UnitPrice", AVERAGE('OrderDetails'[UnitPrice]),
    "TotalAmount", SUM('OrderDetails'[Quantity] * 'OrderDetails'[UnitPrice])
)

-- Fact table characteristics:
-- ✓ Measurable metrics
-- ✓ Foreign keys only
-- ✓ No descriptive text
-- ✓ Additive measures
-- ✗ Avoid unnecessary columns

Creating Dimension Tables

-- Date dimension (calendar table)
DateDimension = 
ADDCOLUMNS(
    CALENDAR(DATE(2020, 1, 1), DATE(2030, 12, 31)),
    "Year", YEAR([Date]),
    "Quarter", "Q" & FORMAT([Date], "Q"),
    "Month", FORMAT([Date], "MMM"),
    "MonthNumber", MONTH([Date]),
    "Day", DAY([Date]),
    "DayOfWeek", FORMAT([Date], "DDD"),
    "DayOfWeekNumber", WEEKDAY([Date]),
    "YearMonth", FORMAT([Date], "YYYY-MM"),
    "IsWeekend", WEEKDAY([Date]) IN {1, 7}
)

-- Product dimension
Product = 
SELECTCOLUMNS(
    'ProductTable',
    "ProductID", [ProductID],
    "ProductName", [Name],
    "Category", [Category],
    "Subcategory", [Subcategory],
    "Brand", [Brand],
    "Color", [Color],
    "ListPrice", [ListPrice]
)

-- Customer dimension
Customer = 
SELECTCOLUMNS(
    'CustomerTable',
    "CustomerID", [CustomerID],
    "CustomerName", [FirstName] & " " & [LastName],
    "Email", [Email],
    "City", [City],
    "State", [State],
    "Country", [Country],
    "Segment", [CustomerSegment]
)

Table Relationships

Creating Relationships

Relationship types:

1. One-to-Many (most common):
   - Dimension (one) → Fact (many)
   - Example: Product.ProductID → Sales.ProductID
   - One product can have many sales

2. Many-to-One:
   - Reverse of one-to-many
   - Fact (many) → Dimension (one)

3. One-to-One:
   - Rare in star schema
   - Used for splitting large tables

4. Many-to-Many:
   - Avoid in star schema
   - Use bridge tables instead

Creating relationships in Power BI Desktop:
1. Model view
2. Drag from dimension key to fact foreign key
3. Verify cardinality: 1:* (one-to-many)
4. Cross filter direction: Single (dimension filters fact)

Relationship Properties

Key properties:

1. Cardinality:
   - One-to-many (1:*)
   - Many-to-one (*:1)
   - One-to-one (1:1)
   - Many-to-many (*:*)

2. Cross filter direction:
   - Single: Dimension filters fact (standard)
   - Both: Bidirectional filtering (use carefully)
   
3. Make this relationship active:
   - Active: Used in calculations
   - Inactive: Used with USERELATIONSHIP()

4. Assume referential integrity:
   - Improves performance
   - Use when all fact keys exist in dimension

Role-Playing Dimensions

Role-playing dimension: Same dimension used multiple times

Example: Date dimension with multiple roles
- OrderDate
- ShipDate
- DueDate

Implementation:

1. Physical tables (not recommended):
   - Duplicate Date table 3 times
   - Creates model bloat

2. Multiple relationships (recommended):
   - One Date table
   - Multiple relationships to Sales:
     * Sales[OrderDate] → Date[Date] (active)
     * Sales[ShipDate] → Date[Date] (inactive)
     * Sales[DueDate] → Date[Date] (inactive)

Usage in DAX:
CALCULATE(
    [Total Sales],
    USERELATIONSHIP(Sales[ShipDate], 'Date'[Date])
)

Calculated Columns vs Measures

Calculated Columns

-- Calculated column: Computed row-by-row at refresh

-- Full Name
Customer[Full Name] = 
Customer[FirstName] & " " & Customer[LastName]

-- Age Group
Customer[Age Group] = 
SWITCH(
    TRUE(),
    Customer[Age] < 18, "Under 18",
    Customer[Age] < 35, "18-34",
    Customer[Age] < 55, "35-54",
    "55+"
)

-- Profit
Sales[Profit] = Sales[Revenue] - Sales[Cost]

-- Year-Month
Sales[YearMonth] = FORMAT(Sales[OrderDate], "YYYY-MM")

When to use:
✓ Needed for filtering/slicing
✓ Rarely changes
✓ Row context calculations
✗ Avoid for aggregations (use measures)

Measures

-- Measure: Computed dynamically based on filter context

-- Total Sales
Total Sales = SUM(Sales[Amount])

-- Total Quantity
Total Quantity = SUM(Sales[Quantity])

-- Average Sale
Average Sale = DIVIDE([Total Sales], [Total Quantity], 0)

-- Sales YoY Growth %
Sales YoY Growth % = 
VAR CurrentYear = [Total Sales]
VAR PreviousYear = 
    CALCULATE(
        [Total Sales],
        DATEADD('Date'[Date], -1, YEAR)
    )
RETURN
    DIVIDE(CurrentYear - PreviousYear, PreviousYear, 0)

-- Running Total
Running Total = 
CALCULATE(
    [Total Sales],
    FILTER(
        ALLSELECTED('Date'),
        'Date'[Date] <= MAX('Date'[Date])
    )
)

When to use:
✓ Aggregations (SUM, AVG, COUNT)
✓ Dynamic calculations
✓ Context-dependent results
✓ Better performance for large datasets

DAX Fundamentals

Filter Context

-- Filter context: Filters applied to the model

-- Example report:
-- Visual: Matrix
-- Rows: Product[Category]
-- Values: [Total Sales]

-- For each row, filter context is:
-- Category = "Electronics" → Total Sales for Electronics
-- Category = "Clothing" → Total Sales for Clothing

-- Modify filter context with CALCULATE:
Sales All Categories = 
CALCULATE(
    [Total Sales],
    ALL(Product[Category])  -- Remove category filter
)

Sales for Electronics = 
CALCULATE(
    [Total Sales],
    Product[Category] = "Electronics"  -- Apply specific filter
)

Row Context

-- Row context: Current row in table iteration

-- Calculated column has row context:
Sales[Line Total] = Sales[Quantity] * Sales[UnitPrice]

-- Iterate functions create row context:
Total Freight = 
SUMX(
    Sales,  -- Iterate over Sales table
    Sales[Quantity] * Sales[FreightPerUnit]  -- Row context
)

-- Convert row context to filter context with CALCULATE:
Product Total Sales = 
CALCULATE(
    [Total Sales]  -- Uses filter context
)
-- In calculated column, CALCULATE converts current row to filter

Time Intelligence

-- Requires proper Date table with Date type column

-- Year-to-Date
YTD Sales = TOTALYTD([Total Sales], 'Date'[Date])

-- Month-to-Date
MTD Sales = TOTALMTD([Total Sales], 'Date'[Date])

-- Quarter-to-Date
QTD Sales = TOTALQTD([Total Sales], 'Date'[Date])

-- Previous Year
PY Sales = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date]))

-- Year-over-Year Growth
YoY Growth = [Total Sales] - [PY Sales]

YoY Growth % = DIVIDE([YoY Growth], [PY Sales], 0)

-- Moving Average (3 months)
Sales 3M MA = 
CALCULATE(
    [Total Sales],
    DATESINPERIOD(
        'Date'[Date],
        LASTDATE('Date'[Date]),
        -3,
        MONTH
    )
) / 3

-- Custom fiscal year (starts July)
FY Sales = 
CALCULATE(
    [Total Sales],
    DATESYTD('Date'[Date], "06-30")
)

Data Model Optimization

Reducing Model Size

1. Remove unnecessary columns:
   - Delete unused columns
   - Don't import everything from source
   
2. Change data types:
   - Use smallest appropriate type
   - Text: 50 chars vs unlimited
   - Integer: 32-bit vs 64-bit
   - Decimal: Fixed vs floating

3. Disable auto date/time:
   - File → Options → Data Load
   - Uncheck "Auto date/time"
   - Create explicit date table

4. Use summarized tables:
   - Pre-aggregate at source
   - Example: Daily instead of transaction-level
   
5. Split date and time:
   - Store date and time in separate columns
   - Better compression

Column Statistics

View column statistics:
View → Column Quality
View → Column Distribution
View → Column Profile

Metrics:
- Cardinality: Unique values
- Distinct count
- Empty values
- Error count

Low cardinality = better compression
High cardinality = consider partitioning

Relationships Best Practices

1. Use integer keys:
   - Better performance than text keys
   - Surrogate keys recommended

2. Single direction filtering:
   - Avoid bidirectional unless necessary
   - Can cause ambiguity

3. Active relationships:
   - One active path between tables
   - Inactive for role-playing dimensions

4. Avoid many-to-many:
   - Use bridge tables
   - Better performance

Advanced Patterns

Slowly Changing Dimensions (SCD)

-- Type 1 SCD: Overwrite old values
-- Simply update dimension table
-- History is lost

-- Type 2 SCD: Preserve history
-- Add versioning columns to dimension:

Customer_SCD2 = 
ADDCOLUMNS(
    Customer,
    "ValidFrom", [EffectiveDate],
    "ValidTo", [ExpirationDate],
    "IsCurrent", [ExpirationDate] = DATE(9999, 12, 31)
)

-- Filter for current records only
Current Customers = 
FILTER(
    Customer_SCD2,
    Customer_SCD2[IsCurrent] = TRUE
)

-- Point-in-time query
Customers At Date = 
FILTER(
    Customer_SCD2,
    AND(
        Customer_SCD2[ValidFrom] <= [Selected Date],
        Customer_SCD2[ValidTo] > [Selected Date]
    )
)

Parent-Child Hierarchies

-- Employee table with Manager relationship

-- Calculate hierarchy level
Level = 
PATHLENGTH(Employee[Path])

-- Get manager name
Manager Name = 
LOOKUPVALUE(
    Employee[Name],
    Employee[EmployeeID],
    Employee[ManagerID]
)

-- Aggregate up hierarchy
Total Sales Including Subordinates = 
CALCULATE(
    [Total Sales],
    TREATAS(
        PATHCONTAINS(
            Employee[Path],
            Employee[EmployeeID]
        ),
        Employee[EmployeeID]
    )
)

Many-to-Many with Bridge Table

Scenario: Products can belong to multiple categories

Tables:
- Product (ProductID, ProductName)
- Category (CategoryID, CategoryName)
- ProductCategory bridge (ProductID, CategoryID)

Relationships:
Product → ProductCategory (many-to-one)
Category → ProductCategory (many-to-one)

Enable many-to-many:
Set relationship cross filter direction to Both

Composite Models and Dual Storage Mode

Understanding Storage Modes

Power BI supports three storage modes per table:

1. Import Mode (Default)

Data loaded into Power BI model
Compressed using VertiPaq engine
Fast query performance
Requires refresh to update
Limited by memory capacity

Use Cases:
✓ Historical analysis
✓ Complex calculations
✓ Moderate data volumes (<10GB compressed)
✓ Fast visual interactions required

2. DirectQuery Mode

Data queried from source in real-time
No data import, minimal memory usage
Always up-to-date
Query performance depends on source
Limited DAX functionality

Use Cases:
✓ Real-time data requirements
✓ Very large datasets (>100GB)
✓ Data sovereignty/compliance needs
✓ Source has good query performance

3. Dual Mode (Hybrid)

Table acts as Import or DirectQuery contextually
Power BI chooses optimal mode per query
Best of both worlds

Use Cases:
✓ Dimension tables in large DirectQuery models
✓ Improve performance while maintaining real-time fact data

Creating Composite Models

Scenario: Sales fact table (DirectQuery) + Dimension tables (Import)

Steps:
1. Connect to SQL with DirectQuery
2. Load Sales fact table (DirectQuery mode)
3. Add dimension tables:
   - Home → Get Data → Other sources
   - Select dimension tables
   - Load as Import mode
4. Create relationships between Import and DirectQuery tables
5. Set dimension tables to Dual mode for optimal performance

Benefits:
- Real-time sales data (DirectQuery)
- Fast dimension filtering (Import/Dual)
- Reduced source query load
- Flexible refresh schedule for dimensions

Aggregation Tables

Aggregations dramatically improve large DirectQuery model performance by pre-calculating summaries.

Creating Aggregations:

// Original detail table (DirectQuery):
// Sales: 1B rows (OrderID, ProductID, OrderDate, CustomerID, Amount)

// Create aggregation table (Import):
Sales_Agg_Daily = 
SUMMARIZECOLUMNS(
    Sales[OrderDate],
    Sales[ProductID],
    "Total Amount", SUM(Sales[Amount]),
    "Total Quantity", SUM(Sales[Quantity]),
    "Order Count", DISTINCTCOUNT(Sales[OrderID])
)

// Configure aggregation in Power BI:
// 1. Right-click Sales_Agg_Daily → Manage aggregations
// 2. Map columns:
//    - OrderDate → Sales[OrderDate] (GroupBy)
//    - ProductID → Sales[ProductID] (GroupBy)
//    - Total Amount → Sales[Amount] (Sum)
//    - Total Quantity → Sales[Quantity] (Sum)
//    - Order Count → Sales[OrderID] (Count)
// 3. Set detail table to Sales
// 4. Save

// Power BI automatically uses aggregation when possible
// Query: "Total sales by date" → Uses Sales_Agg_Daily (fast)
// Query: "Total sales by customer" → Uses Sales detail (slower)

Aggregation Best Practices:

Grain	Use Case	Performance Gain
Daily × Product	Sales by date/product	100-1000× faster
Monthly × Category	Trend analysis	1000-10000× faster
Yearly × Region	Executive dashboards	10000+× faster

Guidelines:
✓ Create multiple aggregation levels (day, month, year)
✓ Include most-queried dimension combinations
✓ Keep aggregation tables in Import mode
✓ Detail table stays in DirectQuery
✗ Don't aggregate to single row (defeats purpose)
✗ Don't include low-cardinality dimensions only

DirectQuery Optimization

Query Reduction Settings

File → Options → Data Load

Query reduction settings:
☑ Reduce number of queries sent by slicers
  - Apply button required before query executes
  
☑ Apply button for filters and slicers
  - User must explicitly apply selections
  
☑ Reduce number of queries sent by visuals
  - Cross-highlighting doesn't query until complete

Native SQL Query Monitoring

Performance Analyzer → Start Recording → Interact with visual

Review "Direct Query" section:
- Query: Shows SQL sent to source
- Duration: Time for source to return data

Optimization targets:
- < 1 second: Excellent
- 1-3 seconds: Acceptable
- > 3 seconds: Needs optimization (add indexes, partitioning, views)

DirectQuery Best Practices

// ❌ BAD: Complex calculated columns in DirectQuery
// These get recalculated per query
Sales[Profit] = Sales[Amount] - Sales[Cost]  // Calculated column (avoid in DQ)

// ✅ GOOD: Pre-calculate in source
// Create view or computed column in SQL Server:
CREATE VIEW Sales_With_Profit AS
SELECT
    OrderID,
    Amount,
    Cost,
    (Amount - Cost) AS Profit
FROM Sales;

// ❌ BAD: Row-level security with complex logic in DirectQuery
[RLS Rule] = 
CALCULATE(
    DISTINCTCOUNT(Customer[CustomerID]),
    FILTER(Customer, Customer[Region] = USERPRINCIPALNAME())
) > 0

// ✅ GOOD: Implement RLS in source database
-- SQL Server row-level security:
CREATE SECURITY POLICY CustomerFilter
ADD FILTER PREDICATE dbo.fn_securitypredicate(UserName)
ON dbo.Customer
WITH (STATE = ON);

Hybrid Table Patterns

Scenario: Recent data (hot) + Historical data (cold)

Implementation:
1. Partition table in source:
   - Sales_Hot: Last 3 months (DirectQuery)
   - Sales_Historical: Older data (Import)

2. Create union view in Power BI:
Sales = 
UNION(
    Sales_Hot,        // DirectQuery (real-time)
    Sales_Historical  // Import (fast performance)
)

3. Benefits:
   - Recent data always current
   - Historical analysis performant
   - Reduced source load
   - Flexible refresh schedule (historical: weekly; hot: real-time)

Advanced Relationship Patterns

Bidirectional Filtering

Use with Caution:

Standard: Single direction (Dimension → Fact)
Product → Sales: Product filters Sales (one-to-many)

Bidirectional: Both directions
Product ↔ Sales: Product filters Sales AND Sales filters Product

When to use bidirectional:
✓ Many-to-many relationships
✓ Bridging tables
✓ Specific advanced scenarios

Risks:
✗ Ambiguous filter propagation
✗ Performance degradation
✗ Unexpected results in complex models

Example Scenario:

Tables: Sales ↔ Budget (many-to-many)

Goal: Show products with sales but no budget

Solution:
1. Create bridge table: ProductBridge (ProductID)
2. Relationships:
   - Product → ProductBridge (one-to-many, single)
   - Sales → ProductBridge (many-to-one, single)
   - Budget → ProductBridge (many-to-one, single)
3. Enable bidirectional on ProductBridge → Product

Result: Sales and Budget cross-filter through bridge

Inactive Relationships with USERELATIONSHIP

// Date dimension with multiple roles

// Active relationship: Sales[OrderDate] → Date[Date]
Sales by Order Date = SUM(Sales[Amount])

// Inactive: Sales[ShipDate] → Date[Date]
Sales by Ship Date = 
CALCULATE(
    SUM(Sales[Amount]),
    USERELATIONSHIP(Sales[ShipDate], 'Date'[Date])
)

// Inactive: Sales[DueDate] → Date[Date]
Sales by Due Date = 
CALCULATE(
    SUM(Sales[Amount]),
    USERELATIONSHIP(Sales[DueDate], 'Date'[Date])
)

// Dynamic selection with parameter
Sales by Selected Date = 
VAR SelectedRole = SELECTEDVALUE('DateRole'[Role], "OrderDate")
RETURN
    SWITCH(
        SelectedRole,
        "OrderDate", [Sales by Order Date],
        "ShipDate", [Sales by Ship Date],
        "DueDate", [Sales by Due Date],
        BLANK()
    )

Circular Dependency Resolution

Problem: A → B → C → A (circular relationship)

Example:
- Employee → Manager (Employee table self-reference)
- Sales → Product → Category → SubcategoryBridge → Sales

Solutions:

1. Flatten hierarchy (best for small hierarchies):
   - Denormalize Category into Product table
   - Single relationship: Sales → Product

2. Use calculated table to break cycle:
   - Keep relationships: Sales → Product → Category
   - Create: CategoryList = VALUES(Category)
   - Use CategoryList for slicing (no relationship)

3. DAX-based filtering (for complex scenarios):
   - Remove problematic relationship
   - Use TREATAS or CROSSFILTER in measures

Data Model Optimization Techniques

Column Optimization

Reduce Cardinality:

1. Remove high-cardinality text columns:
   ❌ Keep: Customer[Email] (5M unique values)
   ✅ Remove: Use CustomerID for relationships only

2. Convert to low-cardinality:
   ❌ Sales[Comments] (free text, millions of unique values)
   ✅ Sales[Has Comments] (Boolean: TRUE/FALSE)

3. Optimize data types:
   ❌ Text: "2025-01-27" (larger storage)
   ✅ Date: 2025-01-27 (optimized storage)
   
   ❌ Decimal: 123.4567890 (8 bytes)
   ✅ Whole Number: 1234 (4 bytes) if decimals unnecessary

4. Use integer keys instead of text:
   ❌ Product[SKU] = "PROD-12345-XYZ" (text)
   ✅ Product[ProductID] = 12345 (integer, relationship key)

Model Compression Statistics

View model size:
External Tools → DAX Studio → View → DMV Browser → $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS

Key metrics:
- Table Size: MB per table
- Column Cardinality: Unique values
- Dictionary Size: Compressed size
- Data Type: Optimization opportunities

Target reductions:
- Remove unused columns: 10-50% size reduction
- Optimize data types: 20-40% size reduction
- Reduce cardinality: 30-60% size reduction
- Remove calculated columns: 10-30% size reduction (replace with measures)

Incremental Refresh Configuration

Scenario: 10-year historical Sales table, daily updates

Configuration:
1. Define parameters (Power Query):
   - RangeStart (DateTime)
   - RangeEnd (DateTime)

2. Filter table using parameters:
   Sales = Table.SelectRows(Source, each 
       [OrderDate] >= RangeStart and [OrderDate] < RangeEnd)

3. Configure incremental refresh policy (Model view):
   - Right-click Sales table → Incremental refresh
   - Archive data starting: 9 years before refresh date
   - Incrementally refresh data starting: 7 days before refresh date
   - Detect data changes: [OrderDate]

4. Benefits:
   - Full refresh only needed for last 7 days (fast)
   - Historical data (9+ years) never refreshed (stable)
   - Automatic partition management
   - 100× faster refresh times

Data Model Versioning and Governance

Model Documentation

// Table-level documentation
// Right-click table → Properties → Description

Sales_Description = 
"Fact table: Sales transactions
Source: SQL Server - AdventureWorks.dbo.Sales
Refresh: Daily at 2 AM UTC
Grain: One row per order line item
Last Modified: 2025-01-27 by Vladimir Luis"

// Measure-level documentation
Total Sales = SUM(Sales[Amount])
// Description: "Sum of all sales amounts. Includes discounts. Excludes returns (use [Net Sales] for returns-adjusted)."

// Relationship documentation (in model diagram)
// Add text boxes with notes:
"⚠️ OrderDate is active relationship
ShipDate and DueDate are inactive - use USERELATIONSHIP() in measures"

Change Tracking

Version control for Power BI models:

1. Save as PBIX + PBIT (template):
   - PBIX: Full model with data
   - PBIT: Model structure only (no data)
   - Commit PBIT to Git for version control

2. Use external tools:
   - Tabular Editor 2/3: Extract model as JSON
   - Commit JSON to Git
   - Track changes with diff tools

3. Document changes:
   - Changelog table in model
   - Version number in report title
   - Release notes page

Changelog = #table(
    {"Version", "Date", "Changes", "Author"},
    {
        {"2.1", #date(2025,1,27), "Added sales aggregations", "Vladimir"},
        {"2.0", #date(2025,1,15), "Implemented composite model", "Vladimir"},
        {"1.5", #date(2024,12,10), "Added DirectQuery for real-time data", "Team"}
    }
)

Model Health Checks

// Create "Model Health" page with these measures:

Tables Count = COUNTROWS(INFO.TABLES())

Relationships Count = COUNTROWS(INFO.RELATIONSHIPS())

Measures Count = COUNTROWS(INFO.MEASURES())

Calculated Columns Count = COUNTROWS(INFO.COLUMNS("CALCULATED_COLUMN"))

Unused Columns = 
VAR AllColumns = INFO.COLUMNS()
VAR UsedInVisuals = INFO.COLUMNS("USED_IN_VISUALS")
RETURN COUNTROWS(EXCEPT(AllColumns, UsedInVisuals))

Model Size MB = 
SUMX(
    INFO.STORAGE_TABLE_COLUMNS(),
    [DICTIONARY_SIZE] + [DATA_SIZE]
) / 1024 / 1024

Refresh Duration = 
"Last refresh: " & FORMAT([Last Refresh Time], "YYYY-MM-DD HH:mm")

// Alert: Tables without relationships
Orphaned Tables = 
VAR AllTables = VALUES(INFO.TABLES[TABLE_NAME])
VAR TablesInRelationships = 
    UNION(
        VALUES(INFO.RELATIONSHIPS[FROM_TABLE]),
        VALUES(INFO.RELATIONSHIPS[TO_TABLE])
    )
RETURN COUNTROWS(EXCEPT(AllTables, TablesInRelationships))

Performance Optimization

-- Bad: Multiple CALCULATE calls
Slow Measure = 
CALCULATE([Total Sales], Product[Category] = "A") +
CALCULATE([Total Sales], Product[Category] = "B") +
CALCULATE([Total Sales], Product[Category] = "C")

-- Good: Single CALCULATE with filter
Fast Measure = 
CALCULATE(
    [Total Sales],
    Product[Category] IN {"A", "B", "C"}
)

-- Bad: Nested CALCULATE
Slow = 
CALCULATE(
    CALCULATE(
        [Total Sales],
        Product[Category] = "A"
    ),
    'Date'[Year] = 2025
)

-- Good: Combined filters
Fast = 
CALCULATE(
    [Total Sales],
    Product[Category] = "A",
    'Date'[Year] = 2025
)

-- Use variables for repeated calculations
Optimized Measure = 
VAR TotalSales = [Total Sales]
VAR TotalCost = [Total Cost]
VAR Profit = TotalSales - TotalCost
VAR Margin = DIVIDE(Profit, TotalSales, 0)
RETURN
    IF(Margin > 0.2, "High", "Low")

Data Model Design Patterns and Anti-Patterns

✅ Design Patterns (Best Practices)

1. Conformed Dimensions

Pattern: Share dimension tables across multiple fact tables

Example:
- Sales fact → Date dimension
- Budget fact → Date dimension  
- Inventory fact → Date dimension
(All use same Date dimension)

Benefits:
✓ Consistent filtering across facts
✓ Simplified model maintenance
✓ Reduced model size
✓ Easier user experience

Implementation:
- Create single Date dimension
- Establish relationships from all facts
- Use in slicers for cross-fact filtering

2. Junk Dimensions

Pattern: Combine low-cardinality flags into single dimension

Instead of:
- Sales[IsOnline] (Boolean)
- Sales[IsPriority] (Boolean)  
- Sales[IsInternational] (Boolean)
- Sales[PaymentMethod] (3 values)

Create:
OrderAttributes dimension:
- AttributeID (1-24, all combinations)
- IsOnline (Y/N)
- IsPriority (Y/N)
- IsInternational (Y/N)
- PaymentMethod (Cash/Card/Online)

Benefits:
✓ Reduces fact table width
✓ Improves compression
✓ Easier to add new flags
✓ Better performance

3. Surrogate Keys

Pattern: Integer keys instead of natural keys

❌ Natural Key: Product[SKU] = "PROD-ELECT-12345-XYZ"
✅ Surrogate Key: Product[ProductID] = 12345

Benefits:
✓ Faster joins (integer vs text)
✓ Smaller storage
✓ Handles source key changes
✓ Better compression

Implementation:
-- Power Query: Add Index Column
= Table.AddIndexColumn(Source, "ProductID", 1, 1)

4. Snowflake to Star Conversion

Snowflake (normalized):
Sales → Product → Category → SubCategory (3 hops)

Star (denormalized):
Sales → Product (with Category, SubCategory columns)

Conversion in Power Query:
Product_Flattened = 
    Table.ExpandTableColumn(
        Table.NestedJoin(Product, "CategoryID", Category, "CategoryID", "Category"),
        "Category", {"CategoryName", "SubCategoryName"}
    )

Benefits:
✓ Faster queries (fewer joins)
✓ Simplified relationships
✓ Better for business users

5. Fact Table Consolidation

Multiple fact tables with same grain → Single fact table with type dimension

Instead of:
- Sales_Online (OrderID, Amount, Quantity)
- Sales_Store (OrderID, Amount, Quantity)
- Sales_Phone (OrderID, Amount, Quantity)

Create:
- Sales (OrderID, Amount, Quantity, ChannelID)
- Channel (ChannelID, ChannelName: Online/Store/Phone)

Benefits:
✓ Simpler model
✓ Unified analysis
✓ Easier maintenance
✓ Single source of truth

Implementation:
Combined_Sales = Table.Combine({
    Table.AddColumn(Sales_Online, "Channel", each "Online"),
    Table.AddColumn(Sales_Store, "Channel", each "Store"),
    Table.AddColumn(Sales_Phone, "Channel", each "Phone")
})

❌ Anti-Patterns (Common Mistakes)

1. Multiple Fact Tables at Different Grains

❌ WRONG:
- Sales (OrderID, ProductID, DailyTotal) -- Order-level
- Inventory (ProductID, Date, Quantity) -- Daily level  
→ Cannot reliably join or compare

✅ CORRECT:
- Bring both to common grain (Product × Date)
- OR create separate reports for each grain
- OR use calculation groups for different perspectives

2. Bidirectional Relationships Everywhere

❌ WRONG:
Product ↔ Sales ↔ Customer ↔ Region (all bidirectional)
→ Ambiguous filter paths, unexpected results, performance issues

✅ CORRECT:
Product → Sales ← Customer ← Region (single direction)
→ Clear filter propagation, predictable results

3. Many Calculated Columns

❌ WRONG:
Sales[Profit] = Sales[Revenue] - Sales[Cost]  (Calculated column)
Sales[Margin] = DIVIDE(Sales[Profit], Sales[Revenue], 0)  (Calculated column)
→ Increases model size, slower refresh, wastes memory

✅ CORRECT:
Profit = SUM(Sales[Revenue]) - SUM(Sales[Cost])  (Measure)
Margin = DIVIDE([Profit], SUM(Sales[Revenue]), 0)  (Measure)
→ Calculated on-demand, no storage, flexible aggregation

4. Import Mode for Real-Time Data

❌ WRONG:
Importing live IoT sensor data (updates every second)
→ Constant refresh load, data never current, infrastructure strain

✅ CORRECT:
DirectQuery for real-time data
Import for historical analysis
Composite model combining both

5. No Date Table

❌ WRONG:
Using Sales[OrderDate] directly for time intelligence
→ Time intelligence functions fail, no fiscal calendar support

✅ CORRECT:
Create proper date dimension with contiguous dates
Mark as date table
Use for all time intelligence

6. Circular Relationships

❌ WRONG:
Employee → Manager → Department → Employee (circular)
→ Model won't load, calculations fail

✅ CORRECT:
Break cycle: Remove one relationship
Use DAX (USERELATIONSHIP, TREATAS) for complex scenarios
OR denormalize to eliminate circular dependency

Decision Framework

When to Import vs DirectQuery:

Factor	Import	DirectQuery
Data size	< 10GB compressed	> 10GB or very large
Refresh frequency	Daily or less frequent	Real-time required
Query performance need	Critical (sub-second)	Acceptable (1-5 seconds)
Data sovereignty	Flexible	Must stay in source
DAX complexity	Full DAX support	Limited DAX support
Network reliability	Not critical	Must be reliable

When to Use Calculated Column vs Measure:

Use Calculated Column When:	Use Measure When:
Need to filter/slice by result	Aggregating data (SUM, AVG, COUNT)
Value is static (e.g., Full Name)	Result changes with filter context
Required in relationships	Performance is critical
Low cardinality result	Dynamic business logic

When to Use Bidirectional Relationships:

✅ Use bidirectional when:

Many-to-many with bridge table
Specific measure requires it
Model is small and performance acceptable

❌ Avoid bidirectional when:

Standard star schema (use single direction)
Performance is critical
Model has many tables (complexity)
Can achieve same result with DAX

Key Takeaways

Star schema is the foundation: Fact tables (measures) + dimension tables (attributes) = optimal analytics model
Relationships drive filtering: One-to-many from dimensions to facts; understand cardinality and filter direction
Storage mode matters: Import for performance, DirectQuery for real-time, Composite for best of both worlds
Calculated columns vs measures: Columns computed at refresh (static), measures at query time (dynamic) - prefer measures
Date dimension is mandatory: Contiguous date table marked as date table enables time intelligence functions
Aggregations accelerate DirectQuery: Pre-calculate summaries for 100-1000× performance gains on large datasets
Multiple relationships via USERELATIONSHIP: Role-playing dimensions (OrderDate, ShipDate) with inactive relationships
Model optimization reduces size: Remove unused columns, optimize data types, use integer keys, reduce cardinality
Incremental refresh for large tables: Partition by date, refresh recent data only, archive historical data
Documentation prevents confusion: Comment tables/measures, track versions, create model health dashboards
Design patterns prevent rework: Conformed dimensions, junk dimensions, surrogate keys, fact consolidation
Anti-patterns cause problems: Avoid many calculated columns, bidirectional relationships everywhere, circular dependencies
Composite models enable hybrid scenarios: Real-time facts (DirectQuery) + performant dimensions (Import/Dual)
Performance profiling is essential: Use Performance Analyzer and DAX Studio to identify and fix bottlenecks
Filter context understanding is critical: Visuals create filter context; relationships propagate it; CALCULATE modifies it

Next Steps

Design star schema for your data
Create date dimension table
Establish table relationships
Build core measures library
Implement time intelligence
Optimize model size
Test query performance
Document data model

Additional Resources

Model. Relate. Calculate. Visualize.