LINQ Mastery: Advanced Query Techniques in C#

LINQ Mastery: Advanced Query Techniques in C#

Introduction

[Explain LINQ as declarative data query abstraction; benefits of composability, type safety, and readability over imperative loops.]

Prerequisites

  • C# 12+ knowledge
  • Understanding of delegates and lambda expressions
  • .NET 9 SDK

LINQ Fundamentals Recap

Syntax Type Use Case
Query Syntax SQL-like Readable for simple queries
Method Syntax Fluent API Composable, full operator access
Mixed Syntax Hybrid Query + method chaining

Step-by-Step Guide

Step 1: Deferred vs Immediate Execution

Deferred (Lazy):

var query = numbers.Where(n => n > 5); // Not executed yet
var result = query.ToList(); // Executes now

Immediate:

var count = numbers.Count(n => n > 5); // Executes immediately
var sum = numbers.Sum(); // Immediate

Step 2: Custom LINQ Operators

public static class LinqExtensions
{
    public static IEnumerable<T> WhereIf<T>(this IEnumerable<T> source, bool condition, Func<T, bool> predicate)
    {
        return condition ? source.Where(predicate) : source;
    }

    public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
    {
        var batch = new List<T>(size);
        foreach (var item in source)
        {
            batch.Add(item);
            if (batch.Count == size)
            {
                yield return batch;
                batch = new List<T>(size);
            }
        }
        if (batch.Any()) yield return batch;
    }
}

// Usage
var filtered = products
    .WhereIf(includeDiscounted, p => p.Discount > 0)
    .ToList();

var batches = largeList.Batch(100);

Step 3: Expression Trees

Dynamic Filtering:

public static IQueryable<T> ApplyFilter<T>(IQueryable<T> query, string propertyName, object value)
{
    var parameter = Expression.Parameter(typeof(T), "x");
    var property = Expression.Property(parameter, propertyName);
    var constant = Expression.Constant(value);
    var equality = Expression.Equal(property, constant);
    var lambda = Expression.Lambda<Func<T, bool>>(equality, parameter);
    return query.Where(lambda);
}

// Usage
var filteredProducts = products.AsQueryable()
    .ApplyFilter("Category", "Electronics");

Step 4: Advanced Joins

Left Join:

var leftJoin = from customer in customers
               join order in orders on customer.Id equals order.CustomerId into orderGroup
               from order in orderGroup.DefaultIfEmpty()
               select new { customer.Name, OrderId = order?.Id ?? 0 };

Group Join:

var grouped = from customer in customers
              join order in orders on customer.Id equals order.CustomerId into orderGroup
              select new { customer.Name, Orders = orderGroup };

Multiple Key Join:

var result = from order in orders
             join detail in orderDetails on new { order.OrderId, order.Year } equals new { detail.OrderId, detail.Year }
             select new { order, detail };

Step 5: GroupBy with Complex Keys

var grouped = products
    .GroupBy(p => new { p.Category, PriceRange = p.Price / 100 })
    .Select(g => new
    {
        g.Key.Category,
        g.Key.PriceRange,
        Count = g.Count(),
        AvgPrice = g.Average(p => p.Price)
    });

Step 6: Nested LINQ Queries

var result = customers.Select(c => new
{
    c.Name,
    TopOrders = c.Orders
        .OrderByDescending(o => o.Total)
        .Take(3)
        .Select(o => new { o.Id, o.Total })
});

Step 7: Aggregation with Custom Accumulators

var summary = orders.Aggregate(
    new { TotalRevenue = 0m, OrderCount = 0 },
    (acc, order) => new
    {
        TotalRevenue = acc.TotalRevenue + order.Total,
        OrderCount = acc.OrderCount + 1
    }
);

Step 8: Parallel LINQ (PLINQ)

var result = largeCollection
    .AsParallel()
    .Where(x => ExpensiveOperation(x))
    .Select(x => Transform(x))
    .ToList();

// Control parallelism
var controlled = largeCollection
    .AsParallel()
    .WithDegreeOfParallelism(4)
    .Where(x => x.IsValid)
    .ToList();

Performance Optimization

Avoid Multiple Enumerations

Wrong:

if (query.Any())
{
    var first = query.First(); // Re-enumerates
}

Correct:

var list = query.ToList();
if (list.Any())
{
    var first = list.First();
}

Use Indexed Access for Random Lookups

// Wrong: O(n) per lookup
var customer = customers.First(c => c.Id == id);

// Correct: O(1) after initial grouping
var lookup = customers.ToLookup(c => c.Id);
var customer = lookup[id].FirstOrDefault();

Projection Before Filtering (EF Core)

// Wrong: Fetches all columns
var result = dbContext.Products
    .Where(p => p.IsActive)
    .ToList();

// Correct: Select only needed columns
var result = dbContext.Products
    .Where(p => p.IsActive)
    .Select(p => new { p.Id, p.Name, p.Price })
    .ToList();

Common Anti-Patterns

Anti-Pattern 1: Nested ToList Calls

// Wrong: Multiple database hits
var result = customers.ToList()
    .Where(c => c.Orders.ToList().Any(o => o.Total > 1000))
    .ToList();

// Correct: Single query
var result = customers
    .Where(c => c.Orders.Any(o => o.Total > 1000))
    .ToList();

Anti-Pattern 2: Unnecessary Intermediate Collections

// Wrong
var temp = list.Where(x => x.IsActive).ToList();
var result = temp.Select(x => x.Name).ToList();

// Correct
var result = list
    .Where(x => x.IsActive)
    .Select(x => x.Name)
    .ToList();

Advanced Scenarios

Dynamic Sorting

public static IQueryable<T> OrderByProperty<T>(this IQueryable<T> source, string propertyName, bool descending = false)
{
    var parameter = Expression.Parameter(typeof(T), "x");
    var property = Expression.Property(parameter, propertyName);
    var lambda = Expression.Lambda(property, parameter);

    var methodName = descending ? "OrderByDescending" : "OrderBy";
    var resultExpression = Expression.Call(
        typeof(Queryable),
        methodName,
        new Type[] { typeof(T), property.Type },
        source.Expression,
        Expression.Quote(lambda)
    );

    return source.Provider.CreateQuery<T>(resultExpression);
}

Recursive Queries

public static IEnumerable<T> TraverseTree<T>(T root, Func<T, IEnumerable<T>> childSelector)
{
    var stack = new Stack<T>();
    stack.Push(root);

    while (stack.Count > 0)
    {
        var current = stack.Pop();
        yield return current;

        foreach (var child in childSelector(current))
        {
            stack.Push(child);
        }
    }
}

// Usage
var allNodes = TraverseTree(rootCategory, c => c.Subcategories);

Troubleshooting

Issue: Query not executing against database
Solution: Use IQueryable instead of IEnumerable; avoid .ToList() before filtering

Issue: OutOfMemoryException with large datasets
Solution: Use streaming with yield return; avoid .ToList() on large collections

Issue: Poor performance with complex queries
Solution: Profile with EF Core logging; add indexes; consider raw SQL for edge cases

Best Practices

  • Prefer method syntax for complex queries
  • Use IQueryable for database queries to enable server-side execution
  • Avoid side effects in lambda expressions
  • Leverage AsNoTracking() for read-only EF queries
  • Use PLINQ only for CPU-bound operations

Key Takeaways

  • Deferred execution enables query composition without performance penalty.
  • Expression trees power dynamic query building.
  • Custom operators extend LINQ for domain-specific needs.
  • Performance optimization requires understanding execution context.

Next Steps

  • Build expression tree-based dynamic filtering library
  • Implement query result caching strategy
  • Explore async LINQ (IAsyncEnumerable)

Additional Resources


Which LINQ pattern will you apply first?