Azure Monitor and Application Insights: Complete Observability

Azure Monitor and Application Insights: Complete Observability

Introduction

Azure Monitor provides unified observability across applications, infrastructure, and networks. Application Insights delivers deep application performance monitoring (APM) with distributed tracing, while Log Analytics enables powerful KQL queries for troubleshooting and insights.

Prerequisites

  • Azure subscription
  • Application deployed to Azure (Web App, Container App, VM, or AKS)
  • Basic understanding of logging and metrics

Observability Pillars

Pillar Azure Service Data Type
Metrics Azure Monitor Metrics Time-series numerical data
Logs Log Analytics Structured event logs
Traces Application Insights Distributed transaction flows
Alerts Azure Monitor Alerts Proactive notifications

Step-by-Step Guide

Step 1: Enable Application Insights

Azure Portal:

  1. Resource → Monitoring → Application Insights → Enable
  2. Select or create Log Analytics workspace
  3. Note Instrumentation Key / Connection String

Azure CLI:

az monitor app-insights component create \
  --app myapi-insights \
  --location eastus \
  --resource-group rg-monitoring \
  --workspace /subscriptions/.../workspaces/logs-workspace

Step 2: Instrument Application

ASP.NET Core:

// Program.cs
var builder = WebApplication.CreateBuilder(args);

builder.Services.AddApplicationInsightsTelemetry(options =>
{
    options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
    options.EnableAdaptiveSampling = true;
    options.EnableQuickPulseMetricStream = true;
});

var app = builder.Build();

// Middleware for request tracking
app.UseHttpsRedirection();
app.UseAuthorization();
app.MapControllers();

app.Run();

Custom Telemetry:

public class OrderController : ControllerBase
{
    private readonly TelemetryClient _telemetry;

    public OrderController(TelemetryClient telemetry)
    {
        _telemetry = telemetry;
    }

    [HttpPost]
    public async Task<IActionResult> CreateOrder(Order order)
    {
        using var operation = _telemetry.StartOperation<RequestTelemetry>("CreateOrder");
        
        try
        {
            // Custom event
            _telemetry.TrackEvent("OrderCreated", new Dictionary<string, string>
            {
                ["OrderId"] = order.Id,
                ["CustomerId"] = order.CustomerId,
                ["Amount"] = order.Total.ToString()
            });

            // Custom metric
            _telemetry.TrackMetric("OrderValue", order.Total);

            // Dependency tracking (auto-captured for HTTP/SQL)
            await _repository.SaveOrderAsync(order);

            return Ok(order);
        }
        catch (Exception ex)
        {
            _telemetry.TrackException(ex);
            operation.Telemetry.Success = false;
            throw;
        }
    }
}

Node.js:

const appInsights = require('applicationinsights');
appInsights.setup('InstrumentationKey=...')
    .setAutoDependencyCorrelation(true)
    .setAutoCollectRequests(true)
    .setAutoCollectPerformance(true)
    .start();

const client = appInsights.defaultClient;

app.post('/orders', async (req, res) => {
    client.trackEvent({ name: 'OrderCreated', properties: { orderId: req.body.id } });
    client.trackMetric({ name: 'OrderValue', value: req.body.total });
    
    try {
        await saveOrder(req.body);
        res.json({ success: true });
    } catch (error) {
        client.trackException({ exception: error });
        res.status(500).json({ error: error.message });
    }
});

Step 3: KQL Queries for Insights

Failed Requests Analysis:

requests
| where success == false
| where timestamp > ago(24h)
| summarize FailureCount = count() by operation_Name, resultCode
| order by FailureCount desc
| take 10

Slow Request Identification:

requests
| where timestamp > ago(1h)
| where duration > 5000  // milliseconds
| project timestamp, operation_Name, duration, url
| order by duration desc

Dependency Performance:

dependencies
| where timestamp > ago(24h)
| summarize 
    AvgDuration = avg(duration),
    P95Duration = percentile(duration, 95),
    FailureRate = countif(success == false) * 100.0 / count()
    by target, type
| order by P95Duration desc

User Journey Tracking:

customEvents
| where timestamp > ago(7d)
| where name in ("ProductViewed", "AddedToCart", "CheckoutStarted", "OrderCompleted")
| summarize EventCount = count() by name
| render piechart

Funnel Analysis:

let startDate = ago(30d);
let endDate = now();
customEvents
| where timestamp between (startDate .. endDate)
| where name in ("ProductViewed", "AddedToCart", "CheckoutStarted", "OrderCompleted")
| summarize Users = dcount(user_Id) by name
| order by Users desc

Step 4: Distributed Tracing

View End-to-End Transaction:

union requests, dependencies, exceptions
| where operation_Id == "abc123..."
| project timestamp, itemType, name, duration, success
| order by timestamp asc

Service Map Visualization:

Application Insights → Investigate → Application Map

Detect Anomalies:

requests
| where timestamp > ago(7d)
| make-series RequestCount = count() default = 0 on timestamp step 1h
| extend anomalies = series_decompose_anomalies(RequestCount, 1.5)
| mv-expand timestamp to typeof(datetime), RequestCount to typeof(long), anomalies to typeof(double)
| where anomalies != 0

Step 5: Infrastructure Monitoring

VM Metrics:

az monitor metrics list \
  --resource /subscriptions/.../resourceGroups/rg-vms/providers/Microsoft.Compute/virtualMachines/vm-web \
  --metric "Percentage CPU" \
  --start-time 2025-08-04T00:00:00Z \
  --end-time 2025-08-04T23:59:59Z \
  --interval PT1H

Container Insights (AKS):

ContainerLog
| where TimeGenerated > ago(1h)
| where ContainerName == "api-orders"
| where LogEntry contains "error"
| project TimeGenerated, LogEntry

VM Insights:

InsightsMetrics
| where TimeGenerated > ago(1h)
| where Name == "AvailableMB"
| summarize AvgMemoryMB = avg(Val) by Computer
| order by AvgMemoryMB asc

Step 6: Alerting Strategies

Metric Alert (CPU Threshold):

az monitor metrics alert create \
  --name "High CPU Alert" \
  --resource-group rg-monitoring \
  --scopes /subscriptions/.../resourceGroups/rg-web/providers/Microsoft.Web/sites/myapi \
  --condition "avg Percentage CPU > 80" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action /subscriptions/.../actionGroups/ops-team

Log Alert (Error Rate):

// Alert query
requests
| where timestamp > ago(5m)
| summarize 
    TotalRequests = count(),
    FailedRequests = countif(success == false)
| extend ErrorRate = (FailedRequests * 100.0) / TotalRequests
| where ErrorRate > 5
az monitor scheduled-query create \
  --name "High Error Rate" \
  --resource-group rg-monitoring \
  --scopes /subscriptions/.../components/myapi-insights \
  --condition "count > 0" \
  --condition-query "requests | where timestamp > ago(5m) | where success == false | count" \
  --window-size 5m \
  --evaluation-frequency 5m \
  --severity 2 \
  --action /subscriptions/.../actionGroups/ops-team

Smart Detection (Anomaly Alerts):

Application Insights → Configure → Smart Detection → Enable all

Step 7: Custom Dashboards

Azure Dashboard JSON:

{
  "properties": {
    "lenses": [
      {
        "parts": [
          {
            "position": { "x": 0, "y": 0, "colSpan": 6, "rowSpan": 4 },
            "metadata": {
              "type": "Extension/Microsoft_Azure_Monitoring/PartType/MetricsChartPart",
              "settings": {
                "content": {
                  "metrics": [
                    {
                      "resourceId": "/subscriptions/.../components/myapi-insights",
                      "name": "requests/count",
                      "aggregationType": "Count"
                    }
                  ],
                  "title": "Request Rate"
                }
              }
            }
          }
        ]
      }
    ]
  }
}

Workbook for Executive Summary:

// Active users
customEvents
| where timestamp > ago(30d)
| summarize ActiveUsers = dcount(user_Id)

// Request success rate
requests
| where timestamp > ago(30d)
| summarize 
    TotalRequests = count(),
    SuccessfulRequests = countif(success == true)
| extend SuccessRate = (SuccessfulRequests * 100.0) / TotalRequests

Step 8: Cost Optimization

Sampling Configuration:

builder.Services.Configure<TelemetryConfiguration>(config =>
{
    config.DefaultTelemetrySink.TelemetryProcessorChainBuilder
        .UseAdaptiveSampling(maxTelemetryItemsPerSecond: 5)
        .Build();
});

Data Retention:

az monitor log-analytics workspace update \
  --resource-group rg-monitoring \
  --workspace-name logs-workspace \
  --retention-time 90

Cap Daily Ingestion:

az monitor app-insights component update \
  --app myapi-insights \
  --resource-group rg-monitoring \
  --cap 5  # GB per day

Advanced Patterns

Pattern 1: Composite Alerts (Multiple Conditions)

let errorRate = requests
    | where timestamp > ago(5m)
    | summarize ErrorRate = countif(success == false) * 100.0 / count();
let highCpu = performanceCounters
    | where timestamp > ago(5m)
    | where counterName == "% Processor Time"
    | summarize AvgCPU = avg(counterValue);
errorRate
| join kind=inner highCpu on $left.timestamp == $right.timestamp
| where ErrorRate > 5 and AvgCPU > 80

Pattern 2: Proactive Autoscaling

az monitor autoscale create \
  --resource-group rg-web \
  --resource /subscriptions/.../sites/myapi \
  --min-count 2 \
  --max-count 10 \
  --count 2 \
  --scale-out-cooldown 5 \
  --scale-in-cooldown 5

az monitor autoscale rule create \
  --resource-group rg-web \
  --autoscale-name myapi-autoscale \
  --condition "Percentage CPU > 75 avg 5m" \
  --scale out 1

Pattern 3: Live Metrics Stream

// Enable Live Metrics
builder.Services.AddApplicationInsightsTelemetry(options =>
{
    options.EnableQuickPulseMetricStream = true;
});

Troubleshooting

Issue: No telemetry appearing
Solution: Verify connection string; check firewall rules; ensure SDK version compatibility

Issue: High ingestion costs
Solution: Enable adaptive sampling; filter noisy telemetry; reduce retention period

Issue: Missing dependency data
Solution: Ensure SQL/HTTP auto-instrumentation enabled; check dependency tracking configuration

Best Practices

  • Use structured logging (ILogger with scopes)
  • Implement custom events for business metrics
  • Set appropriate sampling rates (5-10 items/sec for most apps)
  • Create actionable alerts (avoid alert fatigue)
  • Use workbooks for stakeholder reporting
  • Regularly review and optimize KQL queries
  • Tag resources with environment/owner for filtering

Key Takeaways

  • Application Insights provides automatic instrumentation for .NET/Node.js/Java.
  • KQL enables powerful log analysis and correlation.
  • Smart Detection identifies anomalies without manual configuration.
  • Distributed tracing visualizes end-to-end request flows.

Next Steps

  • Implement SLA-based alerts with multi-resource queries
  • Explore Azure Monitor for containers (AKS insights)
  • Integrate with Azure DevOps for deployment tracking

Additional Resources


Is your system observable enough to debug production issues?