APIM — Response Caching and Cache Invalidation Strategies
A comprehensive guide to caching in Azure API Management — from basic response caching to advanced patterns with external Redis, invalidation strategies, and production monitoring.
1. Why Caching Matters in API Management
APIM sits between consumers and backends, making it the ideal layer to intercept repeated requests.
Performance: Cached responses serve in single-digit milliseconds vs hundreds for backend calls. No network hops, database connections, or compute needed.
Cost: A product catalog API with 95% cache hit ratio reduces backend DB queries by 95%. Fewer requests means lower App Service/AKS/Function costs and reduced egress bandwidth.
Scalability: Cache absorbs traffic spikes (flash sales, viral content) and shields backends from thundering herd problems. External Redis scales independently of APIM instances.
When NOT to cache: Real-time data with zero staleness tolerance, user-specific mutations (POST/PUT/DELETE), responses with sensitive per-user data that could leak, streaming/SSE endpoints.
2. Architecture: Cache Flow
┌──────────────────────────────────────────────────────────────┐
│ API Consumer Request │
└─────────────────────────┬────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ Azure API Management Gateway │
│ │
│ INBOUND: cache-lookup ──► Cache Hit? ──YES──► Return │
│ │ │
│ NO │
│ ▼ │
│ Forward to Backend │
│ │ │
│ OUTBOUND: cache-store ◄─────────┘ │
│ │
│ STORAGE: ┌─────────────────┐ ┌────────────────────────┐ │
│ │ Internal Cache │ │ External Redis Cache │ │
│ │ (per-instance) │ │ (shared, persistent) │ │
│ └─────────────────┘ └────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Flow: Request arrives → cache-lookup checks cache → hit returns immediately → miss forwards to backend → cache-store saves response → subsequent identical requests served from cache until TTL expires.
3. Types of Caching in APIM
Internal Cache (Built-in)
Per-instance, in-memory cache. Zero setup, sub-millisecond latency. Limited capacity (Developer: 10MB, Standard: 1GB, Premium: 5GB per unit). Cache lost on restart, not shared across instances.
External Cache (Azure Redis)
Shared across all APIM instances and regions. Survives restarts, supports up to 1.2TB (Premium clustered), enables advanced features like pub/sub invalidation. Requires provisioning and connection setup.
Usecase Scenarios
| Scenario | Use |
|---|---|
| Single instance, low traffic | Internal |
| Multi-instance / multi-region | External Redis |
| Cache entries > 1MB | External Redis |
| Need persistence across restarts | External Redis |
| Sub-millisecond latency critical | Internal |
4. Step-by-Step Implementation
4.1 Basic Response Caching
<policies>
<inbound>
<base />
<cache-lookup vary-by-developer="false"
vary-by-developer-groups="false"
caching-type="prefer-external"
downstream-caching-type="none">
<vary-by-header>Accept</vary-by-header>
<vary-by-query-parameter>api-version</vary-by-query-parameter>
</cache-lookup>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
<cache-store duration="3600" />
</outbound>
<on-error>
<base />
</on-error>
</policies>
Key attributes: duration (TTL seconds), caching-type (internal/external/prefer-external), downstream-caching-type (controls Cache-Control header: none/private/public).
4.2 Vary-By Strategies
<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
<!-- Separate entries per content type and language -->
<vary-by-header>Accept</vary-by-header>
<vary-by-header>Accept-Language</vary-by-header>
<!-- Pagination and filtering -->
<vary-by-query-parameter>page</vary-by-query-parameter>
<vary-by-query-parameter>pageSize</vary-by-query-parameter>
<vary-by-query-parameter>category</vary-by-query-parameter>
<!-- Wildcard: vary by ALL query parameters -->
<vary-by-query-parameter>*</vary-by-query-parameter>
<!-- Per-developer subscription cache -->
</cache-lookup>
Custom cache key with C# expression:
<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
<vary-by-header>Accept</vary-by-header>
<vary-by-custom>@{
var tenantId = context.Request.Headers
.GetValueOrDefault("X-Tenant-Id", "default");
var role = context.Request.Headers
.GetValueOrDefault("X-User-Role", "anonymous");
return $"{tenantId}:{role}";
}</vary-by-custom>
</cache-lookup>
4.3 Cache Key Composition
Default key components: HTTP method + URL path + vary-by-query-parameter values + vary-by-header values + developer identity.
Geographic-aware key:
<vary-by-custom>@{
var region = context.Request.Headers
.GetValueOrDefault("X-Forwarded-Region", context.Deployment.Region);
var currency = context.Request.Headers
.GetValueOrDefault("X-Currency", "USD");
return $"{region}:{currency}";
}</vary-by-custom>
4.4 Conditional Caching Based on Response Codes
<outbound>
<base />
<choose>
<when condition="@(context.Response.StatusCode == 200)">
<cache-store duration="3600" />
</when>
<when condition="@(context.Response.StatusCode == 404)">
<cache-store duration="300" />
</when>
<!-- Don't cache 4xx/5xx errors -->
</choose>
</outbound>
Dynamic TTL respecting backend Cache-Control:
<outbound>
<base />
<choose>
<when condition="@(context.Response.StatusCode >= 200 && context.Response.StatusCode < 300)">
<cache-store duration="@{
var cc = context.Response.Headers.GetValueOrDefault("Cache-Control", "");
var maxAge = cc.Split(',').Select(s => s.Trim())
.FirstOrDefault(s => s.StartsWith("max-age="));
return maxAge != null ? int.Parse(maxAge.Split('=')[1]) : 3600;
}" />
</when>
</choose>
</outbound>
4.5 Fragment Caching (cache-lookup-value / cache-store-value)
Fragment caching stores individual values — useful for tokens, config, or partial data:
<inbound>
<base />
<cache-lookup-value key="@("backend-token:" +
context.Request.Headers.GetValueOrDefault("X-Tenant-Id", "default"))"
variable-name="cachedToken"
caching-type="prefer-external" />
<choose>
<when condition="@(!context.Variables.ContainsKey("cachedToken"))">
<send-request mode="new" response-variable-name="tokenResponse">
<set-url>https://login.microsoftonline.com/tenant/oauth2/v2.0/token</set-url>
<set-method>POST</set-method>
<set-header name="Content-Type" exists-action="override">
<value>application/x-www-form-urlencoded</value>
</set-header>
<set-body>grant_type=client_credentials&client_id={{client-id}}&client_secret={{client-secret}}&scope={{scope}}</set-body>
</send-request>
<set-variable name="cachedToken"
value="@(((IResponse)context.Variables["tokenResponse"]).Body.As<JObject>()["access_token"].ToString())" />
<cache-store-value key="@("backend-token:" +
context.Request.Headers.GetValueOrDefault("X-Tenant-Id", "default"))"
value="@((string)context.Variables["cachedToken"])"
duration="3300"
caching-type="prefer-external" />
</when>
</choose>
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["cachedToken"])</value>
</set-header>
</inbound>
5. External Cache with Azure Redis
5.1 Provisioning
az group create --name rg-apim-cache --location eastus2
az redis create \
--name redis-apim-cache-prod \
--resource-group rg-apim-cache \
--location eastus2 \
--sku Premium --vm-size p1 \
--enable-non-ssl-port false \
--minimum-tls-version 1.2 \
--redis-configuration '{"maxmemory-policy": "allkeys-lru"}'
5.2 Connecting to APIM
REDIS_HOST=$(az redis show --name redis-apim-cache-prod \
--resource-group rg-apim-cache --query hostName -o tsv)
REDIS_KEY=$(az redis list-keys --name redis-apim-cache-prod \
--resource-group rg-apim-cache --query primaryKey -o tsv)
az apim cache create \
--resource-group rg-apim \
--service-name apim-prod \
--cache-id redis-prod \
--connection-string "${REDIS_HOST}:6380,password=${REDIS_KEY},ssl=True,abortConnect=False" \
--description "Production Redis Cache" \
--use-from "default"
5.3 Bicep Configuration
resource apimCache 'Microsoft.ApiManagement/service/caches@2023-05-01-preview' = {
parent: apimService
name: 'redis-prod'
properties: {
connectionString: '${redisHost}:6380,password=${redisKey},ssl=True,abortConnect=False'
useFromLocation: 'default'
description: 'Production external cache'
}
}
Use caching-type="prefer-external" in policies for resilience (falls back to internal if Redis unavailable).
6. Cache Invalidation Patterns
6.1 Time-Based (TTL)
<!-- Dynamic TTL based on data type -->
<cache-store duration="@{
var path = context.Request.Url.Path;
if (path.Contains("/reference/")) return 86400; // 24h
if (path.Contains("/catalog/")) return 3600; // 1h
if (path.Contains("/inventory/")) return 30; // 30s
return 300; // 5min default
}" />
6.2 Event-Based Invalidation
<!-- Invalidation endpoint: POST /cache/invalidate -->
<inbound>
<base />
<validate-jwt header-name="Authorization" require-scheme="Bearer">
<required-claims>
<claim name="roles" match="any">
<value>cache-admin</value>
</claim>
</required-claims>
</validate-jwt>
<set-variable name="req" value="@(context.Request.Body.As<JObject>())" />
<cache-remove-value key="@(((JObject)context.Variables["req"])["cacheKey"].ToString())"
caching-type="prefer-external" />
<return-response>
<set-status code="204" reason="Cache Invalidated" />
</return-response>
</inbound>
6.3 Manual Purge via API
# Selective purge using Redis key patterns
az redis console --name redis-apim-cache-prod \
--resource-group rg-apim-cache \
--command "EVAL \"local keys = redis.call('keys', ARGV[1]) for i=1,#keys do redis.call('del', keys[i]) end return #keys\" 0 'apim:products:*'"
6.4 Invalidation on Write Operations
<policies>
<inbound>
<base />
<choose>
<when condition="@(context.Request.Method == "GET")">
<cache-lookup vary-by-developer="false"
vary-by-developer-groups="false"
caching-type="prefer-external">
<vary-by-query-parameter>*</vary-by-query-parameter>
</cache-lookup>
</when>
</choose>
</inbound>
<outbound>
<base />
<choose>
<when condition="@(context.Request.Method == "GET" && context.Response.StatusCode == 200)">
<cache-store duration="3600" />
</when>
<when condition="@((context.Request.Method == "PUT" ||
context.Request.Method == "POST" ||
context.Request.Method == "DELETE") &&
context.Response.StatusCode >= 200 &&
context.Response.StatusCode < 300)">
<!-- Invalidate the specific resource -->
<cache-remove-value key="@($"response-cache:{context.Request.Url.Path}")"
caching-type="prefer-external" />
<!-- Invalidate the collection endpoint -->
<cache-remove-value key="@{
var segments = context.Request.Url.Path.Split('/');
return $"response-cache:{string.Join("/", segments.Take(segments.Length - 1))}";
}" caching-type="prefer-external" />
</when>
</choose>
</outbound>
</policies>
7. Advanced Patterns
7.1 Cache Warming
Pre-populate cache after deployments or flushes:
<!-- POST /internal/warm-cache -->
<inbound>
<base />
<validate-jwt header-name="Authorization" require-scheme="Bearer">
<required-claims>
<claim name="roles" match="any"><value>cache-admin</value></claim>
</required-claims>
</validate-jwt>
<send-request mode="new" response-variable-name="productsResp">
<set-url>https://backend-api.internal/api/products?page=1&pageSize=100</set-url>
<set-method>GET</set-method>
</send-request>
<cache-store-value
key="response-cache:/api/products?page=1&pageSize=100"
value="@(((IResponse)context.Variables["productsResp"]).Body.As<string>())"
duration="3600" caching-type="prefer-external" />
<return-response>
<set-status code="200" reason="Cache Warmed" />
</return-response>
</inbound>
7.2 Stale-While-Revalidate
Serve stale cache while refreshing in the background:
<inbound>
<base />
<cache-lookup-value key="@($"data:{context.Request.Url.Path}")"
variable-name="cachedData" caching-type="prefer-external" />
<cache-lookup-value key="@($"ts:{context.Request.Url.Path}")"
variable-name="cachedTs" caching-type="prefer-external" />
<choose>
<!-- Fresh data (< 5 min old): return immediately -->
<when condition="@{
if (!context.Variables.ContainsKey("cachedData") ||
!context.Variables.ContainsKey("cachedTs")) return false;
var age = DateTimeOffset.UtcNow.ToUnixTimeSeconds() -
long.Parse((string)context.Variables["cachedTs"]);
return age < 300;
}">
<return-response>
<set-status code="200" reason="OK" />
<set-header name="X-Cache" exists-action="override">
<value>HIT-FRESH</value>
</set-header>
<set-body>@((string)context.Variables["cachedData"])</set-body>
</return-response>
</when>
<!-- Stale but within grace (5-15 min): serve stale, let request continue to refresh -->
</choose>
</inbound>
<outbound>
<base />
<cache-store-value key="@($"data:{context.Request.Url.Path}")"
value="@(context.Response.Body.As<string>(preserveContent: true))"
duration="1800" caching-type="prefer-external" />
<cache-store-value key="@($"ts:{context.Request.Url.Path}")"
value="@(DateTimeOffset.UtcNow.ToUnixTimeSeconds().ToString())"
duration="1800" caching-type="prefer-external" />
</outbound>
7.3 Cache Bypass for Specific Clients
<inbound>
<base />
<choose>
<when condition="@(context.Request.Headers
.GetValueOrDefault("Cache-Control", "") == "no-cache")">
<!-- Skip cache-lookup -->
</when>
<otherwise>
<cache-lookup vary-by-developer="false" vary-by-developer-groups="false"
caching-type="prefer-external">
<vary-by-header>Accept</vary-by-header>
<vary-by-query-parameter>*</vary-by-query-parameter>
</cache-lookup>
</otherwise>
</choose>
</inbound>
<outbound>
<base />
<choose>
<when condition="@(context.Request.Headers
.GetValueOrDefault("Cache-Control", "") != "no-cache")">
<cache-store duration="3600" />
</when>
</choose>
</outbound>
7.4 Per-User vs Shared Caching
Shared (default): Same response for all users — use for public/anonymous data.
Per-user: Vary by user identity extracted from JWT:
<cache-lookup vary-by-developer="false" vary-by-developer-groups="false">
<vary-by-custom>@{
var auth = context.Request.Headers.GetValueOrDefault("Authorization", "");
if (string.IsNullOrEmpty(auth)) return "anonymous";
var token = auth.Replace("Bearer ", "");
var parts = token.Split('.');
if (parts.Length != 3) return "unknown";
var payload = System.Text.Encoding.UTF8.GetString(
Convert.FromBase64String(parts[1].PadRight(
parts[1].Length + (4 - parts[1].Length % 4) % 4, '=')));
return JObject.Parse(payload)["sub"]?.ToString() ?? "unknown";
}</vary-by-custom>
</cache-lookup>
8. Monitoring Cache Performance
KQL: Cache Hit Ratio Over Time
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
TotalRequests = count(),
CacheHits = countif(CacheHit),
HitRatio = round(100.0 * countif(CacheHit) / count(), 2)
by bin(TimeGenerated, 1h)
| order by TimeGenerated desc
KQL: Performance by API Operation
ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
Requests = count(),
HitRatio = round(100.0 * countif(CacheHit) / count(), 2),
AvgLatencyMs = round(avg(TotalTime), 1),
P95LatencyMs = round(percentile(TotalTime, 95), 1)
by ApiId, OperationId
| order by Requests desc
KQL: Latency — Cached vs Uncached
ApiManagementGatewayLogs
| where TimeGenerated > ago(24h)
| where Method == "GET"
| extend CacheHit = BackendTime == 0 or BackendTime == null
| summarize
CachedAvgMs = round(avgif(TotalTime, CacheHit), 2),
UncachedAvgMs = round(avgif(TotalTime, not(CacheHit)), 2),
SpeedupFactor = round(avgif(TotalTime, not(CacheHit)) / avgif(TotalTime, CacheHit), 1)
by ApiId
KQL: Redis Memory Monitoring
AzureMetrics
| where ResourceProvider == "MICROSOFT.CACHE"
| where MetricName in ("usedmemory", "cachehits", "cachemisses")
| where TimeGenerated > ago(24h)
| summarize AvgValue = avg(Average) by bin(TimeGenerated, 5m), MetricName
| render timechart
Azure Monitor Alert
az monitor metrics alert create \
--name "low-cache-hit-ratio" \
--resource-group rg-apim \
--scopes "/subscriptions/{sub}/resourceGroups/rg-apim/providers/Microsoft.ApiManagement/service/apim-prod" \
--condition "avg CacheHitCount < 70" \
--window-size 15m \
--evaluation-frequency 5m
9. Common Pitfalls and Troubleshooting
Caching Authenticated Responses Across Users
Problem: vary-by-developer="false" with user-specific data → User A sees User B's data.
Fix: Use vary-by-developer="true" or vary by Authorization header / JWT claim.
Cache Key Explosion
Problem: Too many vary-by dimensions → millions of entries, low hit ratio, Redis memory bloat.
Fix: Reduce cardinality. Vary by role/tier instead of unique user token. Normalize keys.
Caching Error Responses
Problem: Default cache-store caches 500 errors for the full TTL.
Fix: Always wrap in <choose> with status code condition (see Section 4.4).
Cache Not Working After Deployment
Checklist:
cache-lookupplacement in inbound policy- Request method is GET (cache-lookup ignores non-GET by default)
- External cache connection:
az apim cache show --service-name apim-prod --cache-id redis-prod - Redis connectivity (NSG rules, Private Endpoint)
cache-storein outbound actually executes (not short-circuited)
Large Responses Exceeding Limits
Problem: Internal cache has ~256KB per-entry limit. Large responses silently fail.
Fix: Use caching-type="external" for large payloads or compress responses.
CORS Preflight Issues
Fix: Only cache GET requests explicitly via <choose> condition. Include Origin in vary-by-header.
10. Production Best Practices
- Start internal, graduate to Redis for multi-instance production
- Always conditional cache — only 2xx responses (optionally 404 with short TTL)
- Set TTLs by data type — reference data (24h), catalog (1h), inventory (30s)
- Invalidate on writes — don't rely solely on TTL
- Monitor hit ratios — target >80% for read-heavy APIs
- Use
prefer-externalfor resilience - Vary-by minimally — each dimension multiplies entries exponentially
- Size Redis with
allkeys-lrueviction policy - Never cache
Set-Cookieresponses or user-specific data without proper vary-by - Set
downstream-caching-type="none"for authenticated APIs
TTL Guidelines
| Data Type | TTL | Invalidation |
|---|---|---|
| Reference data (countries, currencies) | 24h | Deploy-time refresh |
| Product catalog | 1h | Event-based |
| Search results | 5-15min | TTL only |
| Inventory/stock | 15-30s | TTL only |
| Auth tokens | Token lifetime - 5min | TTL |
Summary
| Pattern | Use Case | Complexity |
|---|---|---|
| Basic cache-lookup/store | Simple GET APIs | Low |
| Vary-by strategies | Multiple response variants | Low-Medium |
| Fragment caching | Tokens, config, partial data | Medium |
| External Redis | Multi-instance, persistence | Medium |
| Write-through invalidation | CRUD APIs needing consistency | Medium-High |
| Stale-while-revalidate | Latency-sensitive, staleness-tolerant | High |
| Cache warming | Post-deployment critical paths | Medium |
The golden rule: Cache aggressively, invalidate precisely, monitor continuously.