Testing Azure Container Apps

Testing Azure Container Apps

Introduction

Reliable testing is critical for Azure Container Apps (ACA) workloads that scale dynamically, rely on Dapr sidecars, or integrate event-driven patterns (HTTP, Service Bus, Event Grid). This guide establishes an end-to-end testing strategy covering unit, integration, contract, performance, resilience, and security validation with automation examples.

Prerequisites

  • Azure subscription (free trial)
  • Azure CLI (az) + containerapp extension
  • GitHub Actions or Azure DevOps pipeline access
  • Node.js / .NET SDK (sample services/tests)
  • k6 (load testing) or Azure Load Testing resource
  • Trivy / Microsoft Defender for Cloud (image scanning)

Testing Strategy Overview

Layer Scope Tools Goal
Unit Functions, methods xUnit / Jest Deterministic logic correctness
Integration Service + dependent resource (Redis, Cosmos) Testcontainers / Docker Compose Resource wiring & data behavior
Contract API surface & schemas OpenAPI diff, Pact Prevent breaking consumer changes
End-to-End Full workflow (HTTP → event → persistence) Playwright / REST clients Validate business scenarios
Load / Performance RU, latency, concurrency, scale-out k6 / Azure Load Testing Capacity & auto-scaling behavior
Resilience Fault injection, timeouts Chaos Studio (future) / custom scripts Graceful degradation
Security Image + dependency scan Trivy, Defender for Cloud Vulnerability & misconfig detection
Observability Telemetry completeness Application Insights / OpenTelemetry Trace coverage & useful metrics

Architecture Under Test

flowchart LR Client-->Ingress[HTTP Ingress] Ingress-->AppA[API Container] AppA-->DaprA[(Dapr Sidecar)] DaprA-->Redis[(Redis Cache)] AppA-->Queue[(Service Bus Queue)] Queue-->Worker[Worker Container] Worker-->DaprW[(Dapr Sidecar)] DaprW-->Cosmos[(Cosmos DB)] AppA-->Insights[(App Insights)] Worker-->Insights

Local Integration Setup (Testcontainers Example)

// xUnit fixture spinning Redis + Cosmos Emulator (pseudo)
public class IntegrationFixture : IAsyncLifetime {
    public string RedisConnection { get; private set; }
    public CosmosClient Cosmos { get; private set; }
    private IContainer _redis;
    public async Task InitializeAsync() {
        _redis = new ContainerBuilder()
            .WithImage("redis:7")
            .WithPortBinding(6379, true)
            .Build();
        await _redis.StartAsync();
        RedisConnection = $"localhost:{_redis.GetMappedPublicPort(6379)}";
        Cosmos = new CosmosClient("https://localhost:8081", "C2F...=="); // emulator key
    }
    public async Task DisposeAsync() => await _redis.StopAsync();
}

Contract Testing (Pact / OpenAPI Diff)

openapi-diff previous.yaml current.yaml --fail-on-changed

Add consumer-driven contract tests for critical event payloads (e.g., JSON schema versions of queue messages).

Performance & Load (k6)

import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = { stages: [ { duration: '1m', target: 50 }, { duration: '3m', target: 200 }, { duration: '1m', target: 0 } ] };
export default () => {
	const res = http.get(`${__ENV.BASE_URL}/api/orders`);
	check(res, { 'status 200': r => r.status === 200, 'p95 < 300ms': r => r.timings.duration < 300 });
	sleep(1);
};

Run with autoscale scenario to observe container replica count and KEDA scaling events.

Resilience Testing

Fault injections:

  • Terminate one replica (az containerapp revision deactivate).
  • Introduce latency via a test-only middleware (delay 500ms) to verify timeout & retry policies.
  • Simulate Redis outage by blocking port locally; confirm fallback logic.

Security & Compliance

trivy image ghcr.io/org/app-api:latest --severity HIGH,CRITICAL --exit-code 1
trivy fs . --ignore-unfixed

Integrate Defender for Cloud recommendations; fail pipeline on critical CVEs.

CI/CD Pipeline Snippet (GitHub Actions)

name: aca-ci
on: [push]
jobs:
	build-test:
		runs-on: ubuntu-latest
		steps:
			- uses: actions/checkout@v4
			- uses: actions/setup-node@v4
				with: { node-version: '20' }
			- run: npm ci && npm test
			- name: Unit tests (.NET)
				run: dotnet test src/Api.Tests/Api.Tests.csproj --configuration Release
			- name: Load test (k6 smoke)
				run: BASE_URL=${{ secrets.APP_URL }} k6 run tests/load/smoke.js
			- name: Image build
				run: docker build -t ghcr.io/org/app-api:${{ github.sha }} .
			- name: Security scan
				run: trivy image ghcr.io/org/app-api:${{ github.sha }} --severity HIGH,CRITICAL --exit-code 1
			- name: Push image
				run: echo $CR_PAT | docker login ghcr.io -u USER --password-stdin && docker push ghcr.io/org/app-api:${{ github.sha }}
	deploy:
		needs: build-test
		runs-on: ubuntu-latest
		steps:
			- uses: actions/checkout@v4
			- name: Azure Login
				uses: azure/login@v2
				with:
					creds: ${{ secrets.AZURE_CREDENTIALS }}
			- name: Deploy ACA
				run: |
					az containerapp update \
						--name app-api \
						--resource-group rg-aca-prod \
						--image ghcr.io/org/app-api:${{ github.sha }} \
						--set-env-vars APP_ENV=prod

Observability

  • Enable Dapr tracing + OpenTelemetry exporter to Application Insights.
  • Track custom metrics: queueLag, redisHitRate, replicaCount.
  • Alerts: p95 latency, HTTP 5xx rate, throttled Service Bus calls.

Sample Kusto Query (failed requests trend):

requests
| where timestamp > ago(1h)
| where resultCode startswith "5"
| summarize count() by bin(timestamp, 5m)

Troubleshooting Matrix

Symptom Likely Cause Action Preventative
High cold start latency Image size too large Optimize layers, enable caching Multistage builds & slim base
Replica thrash Misconfigured KEDA scaling metric Adjust min/max replicas & cooldown Define stable threshold
429 / throttling Under-provisioned backing services Increase capacity / caching RU & concurrency monitoring
Missing traces Dapr tracing disabled Enable tracing config Version-controlled observability config
Failed deploy due to CVE Critical vulnerability found Patch dependency / rebuild image Scheduled image scans

Best Practices

  • Keep images lean (distroless or slim base).
  • Externalize config via secrets & env vars; rotate regularly.
  • Use revision mode for blue/green; test new revision under load before traffic shift.
  • Automate regression smoke after deploy (status endpoint + key business API).
  • Tag images with git sha + semantic version.
  • Enforce resource limits (CPU/memory) to avoid noisy neighbor issues.

Key Takeaways

  • Multi-layer testing prevents late production surprises.
  • Load & resilience tests validate autoscale + failure recovery.
  • Security scanning must be gating, not advisory.
  • Observability coverage (logs, metrics, traces) enables fast MTTR.

References

Next Steps

  • Add chaos experiments (network latency, pod kill) once Chaos Studio supports ACA.
  • Integrate contract tests in CI for event payloads.
  • Expand performance benchmarks to scheduled nightly runs.

[Detailed explanation with context]

# Example Azure CLI command
az group create --name myResourceGroup --location eastus

Step 2: [Second Major Step]

[Continue with clear, actionable steps]

Step 3: [Third Major Step]

[Add screenshots or diagrams where helpful]

Best Practices

  • [Key best practice 1]
  • [Key best practice 2]
  • [Key best practice 3]

Common Issues & Troubleshooting

Issue: [Common problem]
Solution: [How to fix it]

Key Takeaways

  • ✅ [Main learning point 1]
  • ✅ [Main learning point 2]
  • ✅ [Main learning point 3]

Next Steps

  • [Suggested follow-up topic or action]
  • [Link to related Azure service]

Additional Resources


What are your experiences with [this topic]? Share your thoughts in the comments below!