APIM — Response Caching and Cache Invalidation Strategies

A comprehensive guide to caching in Azure API Management — from basic response caching to advanced patterns with external Redis, invalidation strategies, and production monitoring.

1. Why Caching Matters in API Management

APIM sits between consumers and backends, making it the ideal layer to intercept repeated requests.

Performance: Cached responses serve in single-digit milliseconds vs hundreds for backend calls. No network hops, database connections, or compute needed.

Cost: A product catalog API with 95% cache hit ratio reduces backend DB queries by 95%. Fewer requests means lower App Service/AKS/Function costs and reduced egress bandwidth.

Scalability: Cache absorbs traffic spikes (flash sales, viral content) and shields backends from thundering herd problems. External Redis scales independently of APIM instances.

When NOT to cache: Real-time data with zero staleness tolerance, user-specific mutations (POST/PUT/DELETE), responses with sensitive per-user data that could leak, streaming/SSE endpoints.

2. Architecture: Cache Flow

┌──────────────────────────────────────────────────────────────┐
│                    API Consumer Request                      │
└─────────────────────────┬────────────────────────────────────┘
                          ▼
┌──────────────────────────────────────────────────────────────┐
│              Azure API Management Gateway                    │
│                                                              │
│  INBOUND:  cache-lookup ──► Cache Hit? ──YES──► Return       │
│                                  │                           │
│                                  NO                          │
│                                  ▼                           │
│                          Forward to Backend                  │
│                                  │                           │
│  OUTBOUND: cache-store ◄─────────┘                           │
│                                                              │
│  STORAGE:  ┌─────────────────┐  ┌────────────────────────┐   │
│            │ Internal Cache  │  │ External Redis Cache   │   │
│            │ (per-instance)  │  │ (shared, persistent)   │   │
│            └─────────────────┘  └────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

Flow: Request arrives → cache-lookup checks cache → hit returns immediately → miss forwards to backend → cache-store saves response → subsequent identical requests served from cache until TTL expires.

3. Types of Caching in APIM

Internal Cache (Built-in)

Per-instance, in-memory cache. Zero setup, sub-millisecond latency. Limited capacity (Developer: 10MB, Standard: 1GB, Premium: 5GB per unit). Cache lost on restart, not shared across instances.

External Cache (Azure Redis)

Shared across all APIM instances and regions. Survives restarts, supports up to 1.2TB (Premium clustered), enables advanced features like pub/sub invalidation. Requires provisioning and connection setup.

Usecase Scenarios

Scenario	Use
Single instance, low traffic	Internal
Multi-instance / multi-region	External Redis
Cache entries > 1MB	External Redis
Need persistence across restarts	External Redis
Sub-millisecond latency critical	Internal

4. Step-by-Step Implementation

4.1 Basic Response Caching

<policies>
    <inbound>
        <base />
        <cache-lookup vary-by-developer="false"
                      vary-by-developer-groups="false"
                      caching-type="prefer-external"
                      downstream-caching-type="none">
            <vary-by-header>Accept</vary-by-header>
            <vary-by-query-parameter>api-version</vary-by-query-parameter>
        </cache-lookup>
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
        <cache-store duration="3600" />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

Key attributes: duration (TTL seconds), caching-type (internal/external/prefer-external), downstream-caching-type (controls Cache-Control header: none/private/public).

4.2 Vary-By Strategies

<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
    <!-- Separate entries per content type and language -->
    <vary-by-header>Accept</vary-by-header>
    <vary-by-header>Accept-Language</vary-by-header>
    <!-- Pagination and filtering -->
    <vary-by-query-parameter>page</vary-by-query-parameter>
    <vary-by-query-parameter>pageSize</vary-by-query-parameter>
    <vary-by-query-parameter>category</vary-by-query-parameter>
    <!-- Wildcard: vary by ALL query parameters -->
    <vary-by-query-parameter>*</vary-by-query-parameter>
    <!-- Per-developer subscription cache -->
</cache-lookup>

Custom cache key with C# expression:

<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
    <vary-by-header>Accept</vary-by-header>
    <vary-by-custom>@{
        var tenantId = context.Request.Headers
            .GetValueOrDefault("X-Tenant-Id", "default");
        var role = context.Request.Headers
            .GetValueOrDefault("X-User-Role", "anonymous");
        return $"{tenantId}:{role}";
    }</vary-by-custom>
</cache-lookup>

4.3 Cache Key Composition

Default key components: HTTP method + URL path + vary-by-query-parameter values + vary-by-header values + developer identity.

Geographic-aware key:

<vary-by-custom>@{
    var region = context.Request.Headers
        .GetValueOrDefault("X-Forwarded-Region", context.Deployment.Region);
    var currency = context.Request.Headers
        .GetValueOrDefault("X-Currency", "USD");
    return $"{region}:{currency}";
}</vary-by-custom>

4.4 Conditional Caching Based on Response Codes

<outbound>
    <base />
    <choose>
        <when condition="@(context.Response.StatusCode == 200)">
            <cache-store duration="3600" />
        </when>
        <when condition="@(context.Response.StatusCode == 404)">
            <cache-store duration="300" />
        </when>
        <!-- Don't cache 4xx/5xx errors -->
    </choose>
</outbound>

Dynamic TTL respecting backend Cache-Control:

<outbound>
    <base />
    <choose>
        <when condition="@(context.Response.StatusCode >= 200 && context.Response.StatusCode < 300)">
            <cache-store duration="@{
                var cc = context.Response.Headers.GetValueOrDefault("Cache-Control", "");
                var maxAge = cc.Split(',').Select(s => s.Trim())
                    .FirstOrDefault(s => s.StartsWith("max-age="));
                return maxAge != null ? int.Parse(maxAge.Split('=')[1]) : 3600;
            }" />
        </when>
    </choose>
</outbound>

4.5 Fragment Caching (cache-lookup-value / cache-store-value)

Fragment caching stores individual values — useful for tokens, config, or partial data:

<inbound>
    <base />
    <cache-lookup-value key="@("backend-token:" +
        context.Request.Headers.GetValueOrDefault("X-Tenant-Id", "default"))"
        variable-name="cachedToken"
        caching-type="prefer-external" />
    <choose>
        <when condition="@(!context.Variables.ContainsKey("cachedToken"))">
            <send-request mode="new" response-variable-name="tokenResponse">
                <set-url>https://login.microsoftonline.com/tenant/oauth2/v2.0/token</set-url>
                <set-method>POST</set-method>
                <set-header name="Content-Type" exists-action="override">
                    <value>application/x-www-form-urlencoded</value>
                </set-header>
                <set-body>grant_type=client_credentials&amp;client_id={{client-id}}&amp;client_secret={{client-secret}}&amp;scope={{scope}}</set-body>
            </send-request>
            <set-variable name="cachedToken"
                value="@(((IResponse)context.Variables["tokenResponse"]).Body.As<JObject>()["access_token"].ToString())" />
            <cache-store-value key="@("backend-token:" +
                context.Request.Headers.GetValueOrDefault("X-Tenant-Id", "default"))"
                value="@((string)context.Variables["cachedToken"])"
                duration="3300"
                caching-type="prefer-external" />
        </when>
    </choose>
    <set-header name="Authorization" exists-action="override">
        <value>@("Bearer " + (string)context.Variables["cachedToken"])</value>
    </set-header>
</inbound>

5. External Cache with Azure Redis

5.1 Provisioning

az group create --name rg-apim-cache --location eastus2

az redis create \
  --name redis-apim-cache-prod \
  --resource-group rg-apim-cache \
  --location eastus2 \
  --sku Premium --vm-size p1 \
  --enable-non-ssl-port false \
  --minimum-tls-version 1.2 \
  --redis-configuration '{"maxmemory-policy": "allkeys-lru"}'

5.2 Connecting to APIM

REDIS_HOST=$(az redis show --name redis-apim-cache-prod \
  --resource-group rg-apim-cache --query hostName -o tsv)
REDIS_KEY=$(az redis list-keys --name redis-apim-cache-prod \
  --resource-group rg-apim-cache --query primaryKey -o tsv)

az apim cache create \
  --resource-group rg-apim \
  --service-name apim-prod \
  --cache-id redis-prod \
  --connection-string "${REDIS_HOST}:6380,password=${REDIS_KEY},ssl=True,abortConnect=False" \
  --description "Production Redis Cache" \
  --use-from "default"

5.3 Bicep Configuration

resource apimCache 'Microsoft.ApiManagement/service/caches@2023-05-01-preview' = {
  parent: apimService
  name: 'redis-prod'
  properties: {
    connectionString: '${redisHost}:6380,password=${redisKey},ssl=True,abortConnect=False'
    useFromLocation: 'default'
    description: 'Production external cache'
  }
}

Use caching-type="prefer-external" in policies for resilience (falls back to internal if Redis unavailable).

6. Cache Invalidation Patterns

6.1 Time-Based (TTL)

<!-- Dynamic TTL based on data type -->
<cache-store duration="@{
    var path = context.Request.Url.Path;
    if (path.Contains("/reference/")) return 86400;  // 24h
    if (path.Contains("/catalog/")) return 3600;     // 1h
    if (path.Contains("/inventory/")) return 30;     // 30s
    return 300;                                       // 5min default
}" />

6.2 Event-Based Invalidation

<!-- Invalidation endpoint: POST /cache/invalidate -->
<inbound>
    <base />
    <validate-jwt header-name="Authorization" require-scheme="Bearer">
        <required-claims>
            <claim name="roles" match="any">
                <value>cache-admin</value>
            </claim>
        </required-claims>
    </validate-jwt>
    <set-variable name="req" value="@(context.Request.Body.As<JObject>())" />
    <cache-remove-value key="@(((JObject)context.Variables["req"])["cacheKey"].ToString())"
        caching-type="prefer-external" />
    <return-response>
        <set-status code="204" reason="Cache Invalidated" />
    </return-response>
</inbound>

6.3 Manual Purge via API

# Selective purge using Redis key patterns
az redis console --name redis-apim-cache-prod \
  --resource-group rg-apim-cache \
  --command "EVAL \"local keys = redis.call('keys', ARGV[1]) for i=1,#keys do redis.call('del', keys[i]) end return #keys\" 0 'apim:products:*'"

6.4 Invalidation on Write Operations

<policies>
    <inbound>
        <base />
        <choose>
            <when condition="@(context.Request.Method == "GET")">
                <cache-lookup vary-by-developer="false"
                              vary-by-developer-groups="false"
                              caching-type="prefer-external">
                    <vary-by-query-parameter>*</vary-by-query-parameter>
                </cache-lookup>
            </when>
        </choose>
    </inbound>
    <outbound>
        <base />
        <choose>
            <when condition="@(context.Request.Method == "GET" && context.Response.StatusCode == 200)">
                <cache-store duration="3600" />
            </when>
            <when condition="@((context.Request.Method == "PUT" ||
                               context.Request.Method == "POST" ||
                               context.Request.Method == "DELETE") &&
                              context.Response.StatusCode >= 200 &&
                              context.Response.StatusCode < 300)">
                <!-- Invalidate the specific resource -->
                <cache-remove-value key="@($"response-cache:{context.Request.Url.Path}")"
                    caching-type="prefer-external" />
                <!-- Invalidate the collection endpoint -->
                <cache-remove-value key="@{
                    var segments = context.Request.Url.Path.Split('/');
                    return $"response-cache:{string.Join("/", segments.Take(segments.Length - 1))}";
                }" caching-type="prefer-external" />
            </when>
        </choose>
    </outbound>
</policies>

7. Advanced Patterns

7.1 Cache Warming

Pre-populate cache after deployments or flushes:

<!-- POST /internal/warm-cache -->
<inbound>
    <base />
    <validate-jwt header-name="Authorization" require-scheme="Bearer">
        <required-claims>
            <claim name="roles" match="any"><value>cache-admin</value></claim>
        </required-claims>
    </validate-jwt>
    <send-request mode="new" response-variable-name="productsResp">
        <set-url>https://backend-api.internal/api/products?page=1&amp;pageSize=100</set-url>
        <set-method>GET</set-method>
    </send-request>
    <cache-store-value
        key="response-cache:/api/products?page=1&pageSize=100"
        value="@(((IResponse)context.Variables["productsResp"]).Body.As<string>())"
        duration="3600" caching-type="prefer-external" />
    <return-response>
        <set-status code="200" reason="Cache Warmed" />
    </return-response>
</inbound>

7.2 Stale-While-Revalidate

Serve stale cache while refreshing in the background:

<inbound>
    <base />
    <cache-lookup-value key="@($"data:{context.Request.Url.Path}")"
        variable-name="cachedData" caching-type="prefer-external" />
    <cache-lookup-value key="@($"ts:{context.Request.Url.Path}")"
        variable-name="cachedTs" caching-type="prefer-external" />
    <choose>
        <!-- Fresh data (< 5 min old): return immediately -->
        <when condition="@{
            if (!context.Variables.ContainsKey("cachedData") ||
                !context.Variables.ContainsKey("cachedTs")) return false;
            var age = DateTimeOffset.UtcNow.ToUnixTimeSeconds() -
                long.Parse((string)context.Variables["cachedTs"]);
            return age < 300;
        }">
            <return-response>
                <set-status code="200" reason="OK" />
                <set-header name="X-Cache" exists-action="override">
                    <value>HIT-FRESH</value>
                </set-header>
                <set-body>@((string)context.Variables["cachedData"])</set-body>
            </return-response>
        </when>
        <!-- Stale but within grace (5-15 min): serve stale, let request continue to refresh -->
    </choose>
</inbound>
<outbound>
    <base />
    <cache-store-value key="@($"data:{context.Request.Url.Path}")"
        value="@(context.Response.Body.As<string>(preserveContent: true))"
        duration="1800" caching-type="prefer-external" />
    <cache-store-value key="@($"ts:{context.Request.Url.Path}")"
        value="@(DateTimeOffset.UtcNow.ToUnixTimeSeconds().ToString())"
        duration="1800" caching-type="prefer-external" />
</outbound>

7.3 Cache Bypass for Specific Clients

<inbound>
    <base />
    <choose>
        <when condition="@(context.Request.Headers
            .GetValueOrDefault("Cache-Control", "") == "no-cache")">
            <!-- Skip cache-lookup -->
        </when>
        <otherwise>
            <cache-lookup vary-by-developer="false" vary-by-developer-groups="false"
                          caching-type="prefer-external">
                <vary-by-header>Accept</vary-by-header>
                <vary-by-query-parameter>*</vary-by-query-parameter>
            </cache-lookup>
        </otherwise>
    </choose>
</inbound>
<outbound>
    <base />
    <choose>
        <when condition="@(context.Request.Headers
            .GetValueOrDefault("Cache-Control", "") != "no-cache")">
            <cache-store duration="3600" />
        </when>
    </choose>
</outbound>

7.4 Per-User vs Shared Caching

Shared (default): Same response for all users — use for public/anonymous data.

Per-user: Vary by user identity extracted from JWT:

<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
    <vary-by-custom>@{
        var auth = context.Request.Headers.GetValueOrDefault("Authorization", "");
        if (string.IsNullOrEmpty(auth)) return "anonymous";
        var token = auth.Replace("Bearer ", "");
        var parts = token.Split('.');
        if (parts.Length != 3) return "unknown";
        var payload = System.Text.Encoding.UTF8.GetString(
            Convert.FromBase64String(parts[1].PadRight(
                parts[1].Length + (4 - parts[1].Length % 4) % 4, '=')));
        return JObject.Parse(payload)["sub"]?.ToString() ?? "unknown";
    }</vary-by-custom>
</cache-lookup>

8. Monitoring Cache Performance

KQL: Cache Hit Ratio Over Time

ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
    TotalRequests = count(),
    CacheHits = countif(CacheHit),
    HitRatio = round(100.0 * countif(CacheHit) / count(), 2)
    by bin(TimeGenerated, 1h)
| order by TimeGenerated desc

KQL: Performance by API Operation

ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
    Requests = count(),
    HitRatio = round(100.0 * countif(CacheHit) / count(), 2),
    AvgLatencyMs = round(avg(TotalTime), 1),
    P95LatencyMs = round(percentile(TotalTime, 95), 1)
    by ApiId, OperationId
| order by Requests desc

KQL: Latency — Cached vs Uncached

ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
    CachedAvgMs = round(avgif(TotalTime, CacheHit), 2),
    UncachedAvgMs = round(avgif(TotalTime, not(CacheHit)), 2),
    SpeedupFactor = round(avgif(TotalTime, not(CacheHit)) / avgif(TotalTime, CacheHit), 1)
    by ApiId

KQL: Redis Memory Monitoring

AzureMetrics
| where ResourceProvider == "MICROSOFT.CACHE"
| where MetricName in ("usedmemory", "cachehits", "cachemisses")
| where TimeGenerated > ago(24h)
| summarize AvgValue = avg(Average) by bin(TimeGenerated, 5m), MetricName
| render timechart

Azure Monitor Alert

az monitor metrics alert create \
  --name "low-cache-hit-ratio" \
  --resource-group rg-apim \
  --scopes "/subscriptions/{sub}/resourceGroups/rg-apim/providers/Microsoft.ApiManagement/service/apim-prod" \
  --condition "avg CacheHitCount < 70" \
  --window-size 15m \
  --evaluation-frequency 5m

9. Common Pitfalls and Troubleshooting

Caching Authenticated Responses Across Users

Problem: vary-by-developer="false" with user-specific data → User A sees User B's data.

Fix: Use vary-by-developer="true" or vary by Authorization header / JWT claim.

Cache Key Explosion

Problem: Too many vary-by dimensions → millions of entries, low hit ratio, Redis memory bloat.

Fix: Reduce cardinality. Vary by role/tier instead of unique user token. Normalize keys.

Caching Error Responses

Problem: Default cache-store caches 500 errors for the full TTL.

Fix: Always wrap in <choose> with status code condition (see Section 4.4).

Cache Not Working After Deployment

Checklist:

cache-lookup placement in inbound policy
Request method is GET (cache-lookup ignores non-GET by default)
External cache connection: az apim cache show --service-name apim-prod --cache-id redis-prod
Redis connectivity (NSG rules, Private Endpoint)
cache-store in outbound actually executes (not short-circuited)

Large Responses Exceeding Limits

Problem: Internal cache has ~256KB per-entry limit. Large responses silently fail.

Fix: Use caching-type="external" for large payloads or compress responses.

CORS Preflight Issues

Fix: Only cache GET requests explicitly via <choose> condition. Include Origin in vary-by-header.

10. Production Best Practices

Start internal, graduate to Redis for multi-instance production
Always conditional cache — only 2xx responses (optionally 404 with short TTL)
Set TTLs by data type — reference data (24h), catalog (1h), inventory (30s)
Invalidate on writes — don't rely solely on TTL
Monitor hit ratios — target >80% for read-heavy APIs
Use prefer-external for resilience
Vary-by minimally — each dimension multiplies entries exponentially
Size Redis with allkeys-lru eviction policy
Never cache Set-Cookie responses or user-specific data without proper vary-by
Set downstream-caching-type="none" for authenticated APIs

TTL Guidelines

Data Type	TTL	Invalidation
Reference data (countries, currencies)	24h	Deploy-time refresh
Product catalog	1h	Event-based
Search results	5-15min	TTL only
Inventory/stock	15-30s	TTL only
Auth tokens	Token lifetime - 5min	TTL

Summary

Pattern	Use Case	Complexity
Basic cache-lookup/store	Simple GET APIs	Low
Vary-by strategies	Multiple response variants	Low-Medium
Fragment caching	Tokens, config, partial data	Medium
External Redis	Multi-instance, persistence	Medium
Write-through invalidation	CRUD APIs needing consistency	Medium-High
Stale-while-revalidate	Latency-sensitive, staleness-tolerant	High
Cache warming	Post-deployment critical paths	Medium

The golden rule: Cache aggressively, invalidate precisely, monitor continuously.