← Back to ArticlesAPI Management

APIM — Response Caching and Cache Invalidation Strategies

Implementing response caching, cache invalidation, and cache key strategies for optimal API performance in Azure APIM.

APIM — Response Caching and Cache Invalidation Strategies

A comprehensive guide to caching in Azure API Management — from basic response caching to advanced patterns with external Redis, invalidation strategies, and production monitoring.


1. Why Caching Matters in API Management

APIM sits between consumers and backends, making it the ideal layer to intercept repeated requests.

Performance: Cached responses serve in single-digit milliseconds vs hundreds for backend calls. No network hops, database connections, or compute needed.

Cost: A product catalog API with 95% cache hit ratio reduces backend DB queries by 95%. Fewer requests means lower App Service/AKS/Function costs and reduced egress bandwidth.

Scalability: Cache absorbs traffic spikes (flash sales, viral content) and shields backends from thundering herd problems. External Redis scales independently of APIM instances.

When NOT to cache: Real-time data with zero staleness tolerance, user-specific mutations (POST/PUT/DELETE), responses with sensitive per-user data that could leak, streaming/SSE endpoints.


2. Architecture: Cache Flow

┌──────────────────────────────────────────────────────────────┐
│                    API Consumer Request                      │
└─────────────────────────┬────────────────────────────────────┘
                          ▼
┌──────────────────────────────────────────────────────────────┐
│              Azure API Management Gateway                    │
│                                                              │
│  INBOUND:  cache-lookup ──► Cache Hit? ──YES──► Return       │
│                                  │                           │
│                                  NO                          │
│                                  ▼                           │
│                          Forward to Backend                  │
│                                  │                           │
│  OUTBOUND: cache-store ◄─────────┘                           │
│                                                              │
│  STORAGE:  ┌─────────────────┐  ┌────────────────────────┐   │
│            │ Internal Cache  │  │ External Redis Cache   │   │
│            │ (per-instance)  │  │ (shared, persistent)   │   │
│            └─────────────────┘  └────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

Flow: Request arrives → cache-lookup checks cache → hit returns immediately → miss forwards to backend → cache-store saves response → subsequent identical requests served from cache until TTL expires.


3. Types of Caching in APIM

Internal Cache (Built-in)

Per-instance, in-memory cache. Zero setup, sub-millisecond latency. Limited capacity (Developer: 10MB, Standard: 1GB, Premium: 5GB per unit). Cache lost on restart, not shared across instances.

External Cache (Azure Redis)

Shared across all APIM instances and regions. Survives restarts, supports up to 1.2TB (Premium clustered), enables advanced features like pub/sub invalidation. Requires provisioning and connection setup.

Usecase Scenarios

ScenarioUse
Single instance, low trafficInternal
Multi-instance / multi-regionExternal Redis
Cache entries > 1MBExternal Redis
Need persistence across restartsExternal Redis
Sub-millisecond latency criticalInternal

4. Step-by-Step Implementation

4.1 Basic Response Caching

<policies>
    <inbound>
        <base />
        <cache-lookup vary-by-developer="false"
                      vary-by-developer-groups="false"
                      caching-type="prefer-external"
                      downstream-caching-type="none">
            <vary-by-header>Accept</vary-by-header>
            <vary-by-query-parameter>api-version</vary-by-query-parameter>
        </cache-lookup>
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
        <cache-store duration="3600" />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

Key attributes: duration (TTL seconds), caching-type (internal/external/prefer-external), downstream-caching-type (controls Cache-Control header: none/private/public).

4.2 Vary-By Strategies

<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
    <!-- Separate entries per content type and language -->
    <vary-by-header>Accept</vary-by-header>
    <vary-by-header>Accept-Language</vary-by-header>
    <!-- Pagination and filtering -->
    <vary-by-query-parameter>page</vary-by-query-parameter>
    <vary-by-query-parameter>pageSize</vary-by-query-parameter>
    <vary-by-query-parameter>category</vary-by-query-parameter>
    <!-- Wildcard: vary by ALL query parameters -->
    <vary-by-query-parameter>*</vary-by-query-parameter>
    <!-- Per-developer subscription cache -->
</cache-lookup>

Custom cache key with C# expression:

<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
    <vary-by-header>Accept</vary-by-header>
    <vary-by-custom>@{
        var tenantId = context.Request.Headers
            .GetValueOrDefault("X-Tenant-Id", "default");
        var role = context.Request.Headers
            .GetValueOrDefault("X-User-Role", "anonymous");
        return $"{tenantId}:{role}";
    }</vary-by-custom>
</cache-lookup>

4.3 Cache Key Composition

Default key components: HTTP method + URL path + vary-by-query-parameter values + vary-by-header values + developer identity.

Geographic-aware key:

<vary-by-custom>@{
    var region = context.Request.Headers
        .GetValueOrDefault("X-Forwarded-Region", context.Deployment.Region);
    var currency = context.Request.Headers
        .GetValueOrDefault("X-Currency", "USD");
    return $"{region}:{currency}";
}</vary-by-custom>

4.4 Conditional Caching Based on Response Codes

<outbound>
    <base />
    <choose>
        <when condition="@(context.Response.StatusCode == 200)">
            <cache-store duration="3600" />
        </when>
        <when condition="@(context.Response.StatusCode == 404)">
            <cache-store duration="300" />
        </when>
        <!-- Don't cache 4xx/5xx errors -->
    </choose>
</outbound>

Dynamic TTL respecting backend Cache-Control:

<outbound>
    <base />
    <choose>
        <when condition="@(context.Response.StatusCode >= 200 && context.Response.StatusCode < 300)">
            <cache-store duration="@{
                var cc = context.Response.Headers.GetValueOrDefault("Cache-Control", "");
                var maxAge = cc.Split(',').Select(s => s.Trim())
                    .FirstOrDefault(s => s.StartsWith("max-age="));
                return maxAge != null ? int.Parse(maxAge.Split('=')[1]) : 3600;
            }" />
        </when>
    </choose>
</outbound>

4.5 Fragment Caching (cache-lookup-value / cache-store-value)

Fragment caching stores individual values — useful for tokens, config, or partial data:

<inbound>
    <base />
    <cache-lookup-value key="@("backend-token:" +
        context.Request.Headers.GetValueOrDefault("X-Tenant-Id", "default"))"
        variable-name="cachedToken"
        caching-type="prefer-external" />
    <choose>
        <when condition="@(!context.Variables.ContainsKey("cachedToken"))">
            <send-request mode="new" response-variable-name="tokenResponse">
                <set-url>https://login.microsoftonline.com/tenant/oauth2/v2.0/token</set-url>
                <set-method>POST</set-method>
                <set-header name="Content-Type" exists-action="override">
                    <value>application/x-www-form-urlencoded</value>
                </set-header>
                <set-body>grant_type=client_credentials&amp;client_id={{client-id}}&amp;client_secret={{client-secret}}&amp;scope={{scope}}</set-body>
            </send-request>
            <set-variable name="cachedToken"
                value="@(((IResponse)context.Variables["tokenResponse"]).Body.As<JObject>()["access_token"].ToString())" />
            <cache-store-value key="@("backend-token:" +
                context.Request.Headers.GetValueOrDefault("X-Tenant-Id", "default"))"
                value="@((string)context.Variables["cachedToken"])"
                duration="3300"
                caching-type="prefer-external" />
        </when>
    </choose>
    <set-header name="Authorization" exists-action="override">
        <value>@("Bearer " + (string)context.Variables["cachedToken"])</value>
    </set-header>
</inbound>

5. External Cache with Azure Redis

5.1 Provisioning

az group create --name rg-apim-cache --location eastus2

az redis create \
  --name redis-apim-cache-prod \
  --resource-group rg-apim-cache \
  --location eastus2 \
  --sku Premium --vm-size p1 \
  --enable-non-ssl-port false \
  --minimum-tls-version 1.2 \
  --redis-configuration '{"maxmemory-policy": "allkeys-lru"}'

5.2 Connecting to APIM

REDIS_HOST=$(az redis show --name redis-apim-cache-prod \
  --resource-group rg-apim-cache --query hostName -o tsv)
REDIS_KEY=$(az redis list-keys --name redis-apim-cache-prod \
  --resource-group rg-apim-cache --query primaryKey -o tsv)

az apim cache create \
  --resource-group rg-apim \
  --service-name apim-prod \
  --cache-id redis-prod \
  --connection-string "${REDIS_HOST}:6380,password=${REDIS_KEY},ssl=True,abortConnect=False" \
  --description "Production Redis Cache" \
  --use-from "default"

5.3 Bicep Configuration

resource apimCache 'Microsoft.ApiManagement/service/caches@2023-05-01-preview' = {
  parent: apimService
  name: 'redis-prod'
  properties: {
    connectionString: '${redisHost}:6380,password=${redisKey},ssl=True,abortConnect=False'
    useFromLocation: 'default'
    description: 'Production external cache'
  }
}

Use caching-type="prefer-external" in policies for resilience (falls back to internal if Redis unavailable).


6. Cache Invalidation Patterns

6.1 Time-Based (TTL)

<!-- Dynamic TTL based on data type -->
<cache-store duration="@{
    var path = context.Request.Url.Path;
    if (path.Contains("/reference/")) return 86400;  // 24h
    if (path.Contains("/catalog/")) return 3600;     // 1h
    if (path.Contains("/inventory/")) return 30;     // 30s
    return 300;                                       // 5min default
}" />

6.2 Event-Based Invalidation

<!-- Invalidation endpoint: POST /cache/invalidate -->
<inbound>
    <base />
    <validate-jwt header-name="Authorization" require-scheme="Bearer">
        <required-claims>
            <claim name="roles" match="any">
                <value>cache-admin</value>
            </claim>
        </required-claims>
    </validate-jwt>
    <set-variable name="req" value="@(context.Request.Body.As<JObject>())" />
    <cache-remove-value key="@(((JObject)context.Variables["req"])["cacheKey"].ToString())"
        caching-type="prefer-external" />
    <return-response>
        <set-status code="204" reason="Cache Invalidated" />
    </return-response>
</inbound>

6.3 Manual Purge via API

# Selective purge using Redis key patterns
az redis console --name redis-apim-cache-prod \
  --resource-group rg-apim-cache \
  --command "EVAL \"local keys = redis.call('keys', ARGV[1]) for i=1,#keys do redis.call('del', keys[i]) end return #keys\" 0 'apim:products:*'"

6.4 Invalidation on Write Operations

<policies>
    <inbound>
        <base />
        <choose>
            <when condition="@(context.Request.Method == "GET")">
                <cache-lookup vary-by-developer="false"
                              vary-by-developer-groups="false"
                              caching-type="prefer-external">
                    <vary-by-query-parameter>*</vary-by-query-parameter>
                </cache-lookup>
            </when>
        </choose>
    </inbound>
    <outbound>
        <base />
        <choose>
            <when condition="@(context.Request.Method == "GET" && context.Response.StatusCode == 200)">
                <cache-store duration="3600" />
            </when>
            <when condition="@((context.Request.Method == "PUT" ||
                               context.Request.Method == "POST" ||
                               context.Request.Method == "DELETE") &&
                              context.Response.StatusCode >= 200 &&
                              context.Response.StatusCode < 300)">
                <!-- Invalidate the specific resource -->
                <cache-remove-value key="@($"response-cache:{context.Request.Url.Path}")"
                    caching-type="prefer-external" />
                <!-- Invalidate the collection endpoint -->
                <cache-remove-value key="@{
                    var segments = context.Request.Url.Path.Split('/');
                    return $"response-cache:{string.Join("/", segments.Take(segments.Length - 1))}";
                }" caching-type="prefer-external" />
            </when>
        </choose>
    </outbound>
</policies>

7. Advanced Patterns

7.1 Cache Warming

Pre-populate cache after deployments or flushes:

<!-- POST /internal/warm-cache -->
<inbound>
    <base />
    <validate-jwt header-name="Authorization" require-scheme="Bearer">
        <required-claims>
            <claim name="roles" match="any"><value>cache-admin</value></claim>
        </required-claims>
    </validate-jwt>
    <send-request mode="new" response-variable-name="productsResp">
        <set-url>https://backend-api.internal/api/products?page=1&amp;pageSize=100</set-url>
        <set-method>GET</set-method>
    </send-request>
    <cache-store-value
        key="response-cache:/api/products?page=1&pageSize=100"
        value="@(((IResponse)context.Variables["productsResp"]).Body.As<string>())"
        duration="3600" caching-type="prefer-external" />
    <return-response>
        <set-status code="200" reason="Cache Warmed" />
    </return-response>
</inbound>

7.2 Stale-While-Revalidate

Serve stale cache while refreshing in the background:

<inbound>
    <base />
    <cache-lookup-value key="@($"data:{context.Request.Url.Path}")"
        variable-name="cachedData" caching-type="prefer-external" />
    <cache-lookup-value key="@($"ts:{context.Request.Url.Path}")"
        variable-name="cachedTs" caching-type="prefer-external" />
    <choose>
        <!-- Fresh data (< 5 min old): return immediately -->
        <when condition="@{
            if (!context.Variables.ContainsKey("cachedData") ||
                !context.Variables.ContainsKey("cachedTs")) return false;
            var age = DateTimeOffset.UtcNow.ToUnixTimeSeconds() -
                long.Parse((string)context.Variables["cachedTs"]);
            return age < 300;
        }">
            <return-response>
                <set-status code="200" reason="OK" />
                <set-header name="X-Cache" exists-action="override">
                    <value>HIT-FRESH</value>
                </set-header>
                <set-body>@((string)context.Variables["cachedData"])</set-body>
            </return-response>
        </when>
        <!-- Stale but within grace (5-15 min): serve stale, let request continue to refresh -->
    </choose>
</inbound>
<outbound>
    <base />
    <cache-store-value key="@($"data:{context.Request.Url.Path}")"
        value="@(context.Response.Body.As<string>(preserveContent: true))"
        duration="1800" caching-type="prefer-external" />
    <cache-store-value key="@($"ts:{context.Request.Url.Path}")"
        value="@(DateTimeOffset.UtcNow.ToUnixTimeSeconds().ToString())"
        duration="1800" caching-type="prefer-external" />
</outbound>

7.3 Cache Bypass for Specific Clients

<inbound>
    <base />
    <choose>
        <when condition="@(context.Request.Headers
            .GetValueOrDefault("Cache-Control", "") == "no-cache")">
            <!-- Skip cache-lookup -->
        </when>
        <otherwise>
            <cache-lookup vary-by-developer="false" vary-by-developer-groups="false"
                          caching-type="prefer-external">
                <vary-by-header>Accept</vary-by-header>
                <vary-by-query-parameter>*</vary-by-query-parameter>
            </cache-lookup>
        </otherwise>
    </choose>
</inbound>
<outbound>
    <base />
    <choose>
        <when condition="@(context.Request.Headers
            .GetValueOrDefault("Cache-Control", "") != "no-cache")">
            <cache-store duration="3600" />
        </when>
    </choose>
</outbound>

7.4 Per-User vs Shared Caching

Shared (default): Same response for all users — use for public/anonymous data.

Per-user: Vary by user identity extracted from JWT:

<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
    <vary-by-custom>@{
        var auth = context.Request.Headers.GetValueOrDefault("Authorization", "");
        if (string.IsNullOrEmpty(auth)) return "anonymous";
        var token = auth.Replace("Bearer ", "");
        var parts = token.Split('.');
        if (parts.Length != 3) return "unknown";
        var payload = System.Text.Encoding.UTF8.GetString(
            Convert.FromBase64String(parts[1].PadRight(
                parts[1].Length + (4 - parts[1].Length % 4) % 4, '=')));
        return JObject.Parse(payload)["sub"]?.ToString() ?? "unknown";
    }</vary-by-custom>
</cache-lookup>

8. Monitoring Cache Performance

KQL: Cache Hit Ratio Over Time

ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
    TotalRequests = count(),
    CacheHits = countif(CacheHit),
    HitRatio = round(100.0 * countif(CacheHit) / count(), 2)
    by bin(TimeGenerated, 1h)
| order by TimeGenerated desc

KQL: Performance by API Operation

ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
    Requests = count(),
    HitRatio = round(100.0 * countif(CacheHit) / count(), 2),
    AvgLatencyMs = round(avg(TotalTime), 1),
    P95LatencyMs = round(percentile(TotalTime, 95), 1)
    by ApiId, OperationId
| order by Requests desc

KQL: Latency — Cached vs Uncached

ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
    CachedAvgMs = round(avgif(TotalTime, CacheHit), 2),
    UncachedAvgMs = round(avgif(TotalTime, not(CacheHit)), 2),
    SpeedupFactor = round(avgif(TotalTime, not(CacheHit)) / avgif(TotalTime, CacheHit), 1)
    by ApiId

KQL: Redis Memory Monitoring

AzureMetrics
| where ResourceProvider == "MICROSOFT.CACHE"
| where MetricName in ("usedmemory", "cachehits", "cachemisses")
| where TimeGenerated > ago(24h)
| summarize AvgValue = avg(Average) by bin(TimeGenerated, 5m), MetricName
| render timechart

Azure Monitor Alert

az monitor metrics alert create \
  --name "low-cache-hit-ratio" \
  --resource-group rg-apim \
  --scopes "/subscriptions/{sub}/resourceGroups/rg-apim/providers/Microsoft.ApiManagement/service/apim-prod" \
  --condition "avg CacheHitCount < 70" \
  --window-size 15m \
  --evaluation-frequency 5m

9. Common Pitfalls and Troubleshooting

Caching Authenticated Responses Across Users

Problem: vary-by-developer="false" with user-specific data → User A sees User B's data.

Fix: Use vary-by-developer="true" or vary by Authorization header / JWT claim.

Cache Key Explosion

Problem: Too many vary-by dimensions → millions of entries, low hit ratio, Redis memory bloat.

Fix: Reduce cardinality. Vary by role/tier instead of unique user token. Normalize keys.

Caching Error Responses

Problem: Default cache-store caches 500 errors for the full TTL.

Fix: Always wrap in <choose> with status code condition (see Section 4.4).

Cache Not Working After Deployment

Checklist:

  1. cache-lookup placement in inbound policy
  2. Request method is GET (cache-lookup ignores non-GET by default)
  3. External cache connection: az apim cache show --service-name apim-prod --cache-id redis-prod
  4. Redis connectivity (NSG rules, Private Endpoint)
  5. cache-store in outbound actually executes (not short-circuited)

Large Responses Exceeding Limits

Problem: Internal cache has ~256KB per-entry limit. Large responses silently fail.

Fix: Use caching-type="external" for large payloads or compress responses.

CORS Preflight Issues

Fix: Only cache GET requests explicitly via <choose> condition. Include Origin in vary-by-header.


10. Production Best Practices

  1. Start internal, graduate to Redis for multi-instance production
  2. Always conditional cache — only 2xx responses (optionally 404 with short TTL)
  3. Set TTLs by data type — reference data (24h), catalog (1h), inventory (30s)
  4. Invalidate on writes — don't rely solely on TTL
  5. Monitor hit ratios — target >80% for read-heavy APIs
  6. Use prefer-external for resilience
  7. Vary-by minimally — each dimension multiplies entries exponentially
  8. Size Redis with allkeys-lru eviction policy
  9. Never cache Set-Cookie responses or user-specific data without proper vary-by
  10. Set downstream-caching-type="none" for authenticated APIs

TTL Guidelines

Data TypeTTLInvalidation
Reference data (countries, currencies)24hDeploy-time refresh
Product catalog1hEvent-based
Search results5-15minTTL only
Inventory/stock15-30sTTL only
Auth tokensToken lifetime - 5minTTL

Summary

PatternUse CaseComplexity
Basic cache-lookup/storeSimple GET APIsLow
Vary-by strategiesMultiple response variantsLow-Medium
Fragment cachingTokens, config, partial dataMedium
External RedisMulti-instance, persistenceMedium
Write-through invalidationCRUD APIs needing consistencyMedium-High
Stale-while-revalidateLatency-sensitive, staleness-tolerantHigh
Cache warmingPost-deployment critical pathsMedium

The golden rule: Cache aggressively, invalidate precisely, monitor continuously.