Testing Azure Container Apps

Introduction

Reliable testing is critical for Azure Container Apps (ACA) workloads that scale dynamically, rely on Dapr sidecars, or integrate event-driven patterns (HTTP, Service Bus, Event Grid). This guide establishes an end-to-end testing strategy covering unit, integration, contract, performance, resilience, and security validation with automation examples.

Prerequisites

Azure subscription (free trial)
Azure CLI (az) + containerapp extension
GitHub Actions or Azure DevOps pipeline access
Node.js / .NET SDK (sample services/tests)
k6 (load testing) or Azure Load Testing resource
Trivy / Microsoft Defender for Cloud (image scanning)

Testing Strategy Overview

Layer	Scope	Tools	Goal
Unit	Functions, methods	xUnit / Jest	Deterministic logic correctness
Integration	Service + dependent resource (Redis, Cosmos)	Testcontainers / Docker Compose	Resource wiring & data behavior
Contract	API surface & schemas	OpenAPI diff, Pact	Prevent breaking consumer changes
End-to-End	Full workflow (HTTP → event → persistence)	Playwright / REST clients	Validate business scenarios
Load / Performance	RU, latency, concurrency, scale-out	k6 / Azure Load Testing	Capacity & auto-scaling behavior
Resilience	Fault injection, timeouts	Chaos Studio (future) / custom scripts	Graceful degradation
Security	Image + dependency scan	Trivy, Defender for Cloud	Vulnerability & misconfig detection
Observability	Telemetry completeness	Application Insights / OpenTelemetry	Trace coverage & useful metrics

Architecture Under Test

flowchart LR Client-->Ingress[HTTP Ingress] Ingress-->AppA[API Container] AppA-->DaprA[(Dapr Sidecar)] DaprA-->Redis[(Redis Cache)] AppA-->Queue[(Service Bus Queue)] Queue-->Worker[Worker Container] Worker-->DaprW[(Dapr Sidecar)] DaprW-->Cosmos[(Cosmos DB)] AppA-->Insights[(App Insights)] Worker-->Insights

Local Integration Setup (Testcontainers Example)

// xUnit fixture spinning Redis + Cosmos Emulator (pseudo)
public class IntegrationFixture : IAsyncLifetime {
    public string RedisConnection { get; private set; }
    public CosmosClient Cosmos { get; private set; }
    private IContainer _redis;
    public async Task InitializeAsync() {
        _redis = new ContainerBuilder()
            .WithImage("redis:7")
            .WithPortBinding(6379, true)
            .Build();
        await _redis.StartAsync();
        RedisConnection = $"localhost:{_redis.GetMappedPublicPort(6379)}";
        Cosmos = new CosmosClient("https://localhost:8081", "C2F...=="); // emulator key
    }
    public async Task DisposeAsync() => await _redis.StopAsync();
}

Contract Testing (Pact / OpenAPI Diff)

openapi-diff previous.yaml current.yaml --fail-on-changed

Add consumer-driven contract tests for critical event payloads (e.g., JSON schema versions of queue messages).

Performance & Load (k6)

import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = { stages: [ { duration: '1m', target: 50 }, { duration: '3m', target: 200 }, { duration: '1m', target: 0 } ] };
export default () => {
	const res = http.get(`${__ENV.BASE_URL}/api/orders`);
	check(res, { 'status 200': r => r.status === 200, 'p95 < 300ms': r => r.timings.duration < 300 });
	sleep(1);
};

Run with autoscale scenario to observe container replica count and KEDA scaling events.

Resilience Testing

Fault injections:

Terminate one replica (az containerapp revision deactivate).
Introduce latency via a test-only middleware (delay 500ms) to verify timeout & retry policies.
Simulate Redis outage by blocking port locally; confirm fallback logic.

Security & Compliance

trivy image ghcr.io/org/app-api:latest --severity HIGH,CRITICAL --exit-code 1
trivy fs . --ignore-unfixed

Integrate Defender for Cloud recommendations; fail pipeline on critical CVEs.

CI/CD Pipeline Snippet (GitHub Actions)

name: aca-ci
on: [push]
jobs:
	build-test:
		runs-on: ubuntu-latest
		steps:
			- uses: actions/checkout@v4
			- uses: actions/setup-node@v4
				with: { node-version: '20' }
			- run: npm ci && npm test
			- name: Unit tests (.NET)
				run: dotnet test src/Api.Tests/Api.Tests.csproj --configuration Release
			- name: Load test (k6 smoke)
				run: BASE_URL=${{ secrets.APP_URL }} k6 run tests/load/smoke.js
			- name: Image build
				run: docker build -t ghcr.io/org/app-api:${{ github.sha }} .
			- name: Security scan
				run: trivy image ghcr.io/org/app-api:${{ github.sha }} --severity HIGH,CRITICAL --exit-code 1
			- name: Push image
				run: echo $CR_PAT | docker login ghcr.io -u USER --password-stdin && docker push ghcr.io/org/app-api:${{ github.sha }}
	deploy:
		needs: build-test
		runs-on: ubuntu-latest
		steps:
			- uses: actions/checkout@v4
			- name: Azure Login
				uses: azure/login@v2
				with:
					creds: ${{ secrets.AZURE_CREDENTIALS }}
			- name: Deploy ACA
				run: |
					az containerapp update \
						--name app-api \
						--resource-group rg-aca-prod \
						--image ghcr.io/org/app-api:${{ github.sha }} \
						--set-env-vars APP_ENV=prod

Observability

Enable Dapr tracing + OpenTelemetry exporter to Application Insights.
Track custom metrics: queueLag, redisHitRate, replicaCount.
Alerts: p95 latency, HTTP 5xx rate, throttled Service Bus calls.

Sample Kusto Query (failed requests trend):

requests
| where timestamp > ago(1h)
| where resultCode startswith "5"
| summarize count() by bin(timestamp, 5m)

Troubleshooting Matrix

Symptom	Likely Cause	Action	Preventative
High cold start latency	Image size too large	Optimize layers, enable caching	Multistage builds & slim base
Replica thrash	Misconfigured KEDA scaling metric	Adjust min/max replicas & cooldown	Define stable threshold
429 / throttling	Under-provisioned backing services	Increase capacity / caching	RU & concurrency monitoring
Missing traces	Dapr tracing disabled	Enable tracing config	Version-controlled observability config
Failed deploy due to CVE	Critical vulnerability found	Patch dependency / rebuild image	Scheduled image scans

Best Practices

Keep images lean (distroless or slim base).
Externalize config via secrets & env vars; rotate regularly.
Use revision mode for blue/green; test new revision under load before traffic shift.
Automate regression smoke after deploy (status endpoint + key business API).
Tag images with git sha + semantic version.
Enforce resource limits (CPU/memory) to avoid noisy neighbor issues.

Key Takeaways

Multi-layer testing prevents late production surprises.
Load & resilience tests validate autoscale + failure recovery.
Security scanning must be gating, not advisory.
Observability coverage (logs, metrics, traces) enables fast MTTR.

References

Next Steps

Add chaos experiments (network latency, pod kill) once Chaos Studio supports ACA.
Integrate contract tests in CI for event payloads.
Expand performance benchmarks to scheduled nightly runs.

[Detailed explanation with context]

# Example Azure CLI command
az group create --name myResourceGroup --location eastus

Step 2: [Second Major Step]

[Continue with clear, actionable steps]

Step 3: [Third Major Step]

[Add screenshots or diagrams where helpful]

Best Practices

[Key best practice 1]
[Key best practice 2]
[Key best practice 3]

Common Issues & Troubleshooting

Issue: [Common problem]
Solution: [How to fix it]

Key Takeaways

✅ [Main learning point 1]
✅ [Main learning point 2]
✅ [Main learning point 3]

Next Steps

[Suggested follow-up topic or action]
[Link to related Azure service]

Additional Resources

What are your experiences with [this topic]? Share your thoughts in the comments below!