Azure Functions Error Handling & Dead Letter Queue
Building Resilient Message Processing
Introduction
Robust error handling is essential when building reliable event-driven applications with Azure Functions. When processing messages from Service Bus, Azure Storage Queues, or other messaging systems, failures are inevitable — network timeouts, transient errors, business logic failures, and malformed messages can all cause processing to fail. Without proper error handling, messages can be lost, processed multiple times, or block downstream operations.
This comprehensive guide covers:
- Error handling patterns — Catching, logging, and recovering from failures
- Retry policies — Configuring automatic retries and backoff strategies
- Dead Letter Queues — Handling messages that cannot be processed
- DLQ processing — Investigating and recovering from failed messages
- Monitoring — Tracking errors and debugging issues
Understanding Error Flow
How Errors Occur in Message Processing
┌─────────────────────────────────────────────────────────────────────┐
│ MESSAGE PROCESSING ERROR FLOW │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ Message │ │
│ │ arrives │ │
│ └──────┬──────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Function │ │
│ │ processes │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ │ │
│ Success Failure │
│ │ │ │
│ ▼ ▼ │
│ Complete ┌─────────────────────────────────┐ │
│ Message │ ERROR HANDLING │ │
│ │ │ │
│ │ 1. Log error details │ │
│ │ 2. Check retry count │ │
│ │ 3. Retry if under limit │ │
│ │ 4. Dead-letter if exhausted │ │
│ │ 5. Continue processing │ │
│ └─────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Retry Behavior
┌─────────────────────────────────────────────────────────────────────┐
│ RETRY MECHANISM │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Message Processing Fails │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Check Delivery Count │ │
│ │ │ │
│ │ deliveryCount = 1 ───► Retry #1 │ │
│ │ deliveryCount = 2 ───► Retry #2 │ │
│ │ deliveryCount = 3 ───► Retry #3 │ │
│ │ deliveryCount = 4 ───► Max reached │──► Dead Letter │
│ │ │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Handle Errors in Function Code
Basic Error Handling Pattern
using System;
using System.Threading.Tasks;
using Azure.Messaging.ServiceBus;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
public class OrderProcessingFunction
{
private readonly ILogger<OrderProcessingFunction> _logger;
public OrderProcessingFunction(ILogger<OrderProcessingFunction> logger)
{
_logger = logger;
}
[Function("ProcessOrder")]
public async Task Run(
[ServiceBusTrigger("orders-queue", Connection = "ServiceBusConnection")]
ServiceBusReceivedMessage message)
{
try
{
_logger.LogInformation("Processing message: {MessageId}", message.MessageId);
// Parse the message
var order = JsonSerializer.Deserialize<Order>(message.Body.ToString());
if (order == null)
{
throw new ArgumentException("Invalid order payload - null");
}
// Validate order
if (string.IsNullOrEmpty(order.CustomerId))
{
throw new ValidationException("CustomerId is required");
}
// Process the order (business logic)
await ProcessOrderInternalAsync(order);
_logger.LogInformation("Order {OrderId} processed successfully", order.OrderId);
}
catch (JsonException ex)
{
// Malformed message - don't retry, send to DLQ immediately
_logger.LogError(ex, "Invalid JSON in message {MessageId}", message.MessageId);
throw new ServiceBusReceiverException("Invalid JSON", ServiceBusFailureReason.MessageLockLost);
}
catch (ValidationException ex)
{
// Business validation failure - log and let retry
_logger.LogWarning(ex, "Validation failed for message {MessageId}: {Message}",
message.MessageId, ex.Message);
throw; // Re-throw to trigger retry
}
catch (ExternalServiceException ex) when (ex.IsTransient)
{
// Transient error - retry
_logger.LogError(ex, "Transient error processing order, will retry");
throw;
}
catch (Exception ex)
{
// All other errors - log and retry
_logger.LogError(ex, "Unexpected error processing message {MessageId}", message.MessageId);
throw;
}
}
private async Task ProcessOrderInternalAsync(Order order)
{
// Simulate business logic processing
await Task.Delay(100);
// Throws exception for testing
throw new ExternalServiceException("Service unavailable", true);
}
}
Custom Exception Types
// Exception for transient failures (should retry)
public class ExternalServiceException : Exception
{
public bool IsTransient { get; }
public ExternalServiceException(string message, bool isTransient)
: base(message)
{
IsTransient = isTransient;
}
}
// Exception for business validation failures
public class ValidationException : Exception
{
public ValidationException(string message) : base(message) { }
}
Configure Dead Letter Queue
Queue-Level Configuration
[Function("ProcessWithDLQ")]
public async Task Run(
[ServiceBusTrigger(
"orders-queue",
Connection = "ServiceBusConnection",
DeadLetterQueue = "orders-dlq", // Custom DLQ name
MaxDeliveryCount = 5,
AutoComplete = false)]
ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions)
{
try
{
var order = JsonSerializer.Deserialize<Order>(message.Body.ToString());
await ProcessOrderAsync(order);
// Successfully processed - complete the message
await messageActions.CompleteMessageAsync(message);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process message {MessageId}", message.MessageId);
// Check if we should dead-letter or retry
if (message.DeliveryCount >= 5)
{
// Max retries reached - send to DLQ
_logger.LogWarning("Max delivery count reached, dead-lettering message");
await messageActions.DeadLetterMessageAsync(message, new Dictionary<string, object>
{
{ "ErrorMessage", ex.Message },
{ "ErrorType", ex.GetType().Name },
{ "FailedAt", DateTime.UtcNow.ToString("o") },
{ "RetryCount", message.DeliveryCount }
});
}
else
{
// Abandon message for retry
await messageActions.AbandonMessageAsync(message, new Dictionary<string, object>
{
{ "ErrorMessage", ex.Message },
{ "LastAttempt", DateTime.UtcNow.ToString("o") }
});
}
}
}
ARM Template Configuration
{
"resources": [
{
"type": "Microsoft.ServiceBus/Queues",
"apiVersion": "2022-10-01-preview",
"name": "orders-queue",
"properties": {
"lockDuration": "PT30S",
"maxDeliveryCount": 5,
"deadLetteringOnMessageExpiration": true,
"deadLetterTopic": "orders-dlq-topic"
}
},
{
"type": "Microsoft.ServiceBus/Topics",
"apiVersion": "2022-10-01-preview",
"name": "orders-dlq-topic",
"properties": {
"subscriptionCount": 1
}
}
]
}
Azure CLI
# Create queue with DLQ configuration
az servicebus queue create \
--name orders-queue \
--namespace-name my-namespace \
--resource-group my-rg \
--lock-duration "PT30S" \
--max-delivery-count 5 \
--enable-dead-lettering-on-expiration true
# Create the dead-letter queue
az servicebus queue create \
--name orders-dlq \
--namespace-name my-namespace \
--resource-group my-rg
Process Dead Letter Queue
DLQ Processor Function
public class DeadLetterProcessor
{
private readonly ILogger<DeadLetterProcessor> _logger;
private readonly IMessageRepository _messageRepository;
private readonly IAlertService _alertService;
public DeadLetterProcessor(
ILogger<DeadLetterProcessor> logger,
IMessageRepository messageRepository,
IAlertService alertService)
{
_logger = logger;
_messageRepository = messageRepository;
_alertService = alertService;
}
[Function("ProcessDeadLetter")]
public async Task Run(
[ServiceBusTrigger(
"orders-dlq",
Connection = "ServiceBusConnection",
IsSessionsEnabled = true)]
ServiceBusReceivedMessage message)
{
_logger.LogInformation("Processing dead-letter message: {MessageId}", message.MessageId);
try
{
// Extract error details from message properties
var errorMessage = message.ApplicationProperties.TryGetValue("ErrorMessage", out var err)
? err?.ToString()
: "Unknown error";
var errorType = message.ApplicationProperties.TryGetValue("ErrorType", out var type)
? type?.ToString()
: "Unknown";
var retryCount = message.ApplicationProperties.TryGetValue("RetryCount", out var count)
? Convert.ToInt32(count)
: 0;
_logger.LogWarning(
"DLQ Message - Error: {Error}, Type: {ErrorType}, Retries: {Retries}",
errorMessage, errorType, retryCount);
// Parse original message
var originalMessage = JsonSerializer.Deserialize<DeadLetterMessage>(message.Body.ToString());
// Determine action based on error type
var action = DetermineRecoveryAction(errorType, retryCount);
switch (action)
{
case DLQAction.Retry:
await RetryMessageAsync(originalMessage);
break;
case DLQAction.RepairAndRetry:
await RepairAndRetryAsync(originalMessage);
break;
case DLQAction.Archive:
await ArchiveMessageAsync(originalMessage, message);
break;
case DLQAction.Alert:
await AlertTeamAsync(originalMessage, errorMessage);
break;
}
// Complete the DLQ message
await Task.CompletedTask;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process DLQ message {MessageId}", message.MessageId);
throw;
}
}
private DLQAction DetermineRecoveryAction(string errorType, int retryCount)
{
if (errorType == "ValidationException" && retryCount < 10)
return DLQAction.Alert; // Requires manual intervention
if (errorType == "ExternalServiceException" && retryCount < 3)
return DLQAction.Retry;
if (errorType == "JsonException")
return DLQAction.Archive; // Can't be fixed
return DLQAction.Alert;
}
private async Task RetryMessageAsync(DeadLetterMessage message)
{
var client = new ServiceBusClient(
"mynamespace.servicebus.windows.net",
new DefaultAzureCredential());
var sender = client.CreateSender("orders-queue");
await sender.SendMessageAsync(new ServiceBusMessage(JsonSerializer.Serialize(message.Payload))
{
ContentType = "application/json"
});
_logger.LogInformation("Message re-queued for retry");
}
private async Task ArchiveMessageAsync(DeadLetterMessage message, ServiceBusReceivedMessage original)
{
// Archive to blob storage for later analysis
var blobClient = new BlobClient(
"https://mystorage.blob.core.windows.net/dlq-archive",
$"orders/{DateTime.UtcNow:yyyy-MM}/{message.MessageId}.json");
var archiveData = new
{
originalMessage = message.Payload,
errorDetails = new
{
errorMessage = original.ApplicationProperties["ErrorMessage"],
errorType = original.ApplicationProperties["ErrorType"],
failedAt = original.ApplicationProperties["FailedAt"],
retryCount = original.ApplicationProperties["RetryCount"]
},
archivedAt = DateTime.UtcNow
};
await blobClient.UploadAsync(BinaryData.FromString(JsonSerializer.Serialize(archiveData)));
_logger.LogInformation("Message archived to blob storage");
}
private async Task AlertTeamAsync(DeadLetterMessage message, string error)
{
await _alertService.SendAlertAsync(new AlertRequest
{
Severity = "High",
Title = $"Dead Letter: Order {message.OrderId}",
Message = $"Order processing failed after multiple retries. Error: {error}",
Recipients = new[] { "operations@company.com", "development@company.com" }
});
}
}
public enum DLQAction
{
Retry,
RepairAndRetry,
Archive,
Alert
}
Batch Processing DLQ Messages
[Function("ProcessDLQBatch")]
public async Task Run(
[ServiceBusTrigger(
"orders-dlq",
Connection = "ServiceBusConnection",
IsSessionsEnabled = true)]
ServiceBusReceivedMessage[] messages)
{
_logger.LogInformation("Processing {Count} DLQ messages", messages.Length);
var failedMessages = new List<ServiceBusMessage>();
foreach (var message in messages)
{
try
{
var order = JsonSerializer.Deserialize<Order>(message.Body.ToString());
// Attempt recovery
await RecoverMessageAsync(order);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to recover message {MessageId}", message.MessageId);
failedMessages.Add(new ServiceBusMessage(message.Body)
{
ContentType = message.ContentType,
ApplicationProperties = message.ApplicationProperties
});
}
}
// Re-queue failed messages if any
if (failedMessages.Any())
{
var client = new ServiceBusClient(
"mynamespace.servicebus.windows.net",
new DefaultAzureCredential());
var sender = client.CreateSender("orders-queue");
await sender.SendMessagesAsync(failedMessages);
}
}
Retry Policy Configuration
Custom Retry Policy
public class RetryPolicyConfiguration
{
[Function("ProcessWithRetry")]
public async Task Run(
[ServiceBusTrigger("orders-queue")] ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions)
{
var retryCount = message.DeliveryCount;
var maxRetries = 3;
try
{
await ProcessWithExponentialBackoffAsync(message, retryCount);
await messageActions.CompleteMessageAsync(message);
}
catch (Exception ex) when (retryCount < maxRetries)
{
// Calculate backoff delay
var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
_logger.LogWarning("Retry {RetryCount} failed, waiting {Delay}s before retry",
retryCount, delay.TotalSeconds);
await Task.Delay(delay);
await messageActions.AbandonMessageAsync(message);
}
catch (Exception ex)
{
// Max retries exhausted - dead letter
_logger.LogError(ex, "All retries exhausted for message {MessageId}", message.MessageId);
await messageActions.DeadLetterMessageAsync(message, new Dictionary<string, object>
{
{ "Error", ex.Message },
{ "Retries", retryCount },
{ "FinalAttempt", DateTime.UtcNow.ToString("o") }
});
}
}
private async Task ProcessWithExponentialBackoffAsync(ServiceBusReceivedMessage message, int attempt)
{
// Simulate processing with varying success rates
if (attempt < 2)
{
throw new ExternalServiceException("Service temporarily unavailable", true);
}
await Task.Delay(100);
}
}
host.json Configuration
{
"version": "2.0",
"extensions": {
"serviceBus": {
"prefetchCount": 10,
"messageHandlerOptions": {
"autoComplete": false,
"maxConcurrentCalls": 32,
"maxAutoRenewDuration": "00:05:00"
},
"retryOptions": {
"mode": "exponential",
"minRetryDelay": "00:00:10",
"maxRetryDelay": "00:05:00",
"maxRetries": 3
}
}
}
}
Logging and Monitoring
Structured Logging
[Function("ProcessWithLogging")]
public async Task Run(
[ServiceBusTrigger("orders-queue")] ServiceBusReceivedMessage message)
{
var correlationId = message.CorrelationId ?? message.MessageId;
var deliveryCount = message.DeliveryCount;
using var loggerScope = _logger.BeginScope(new Dictionary<string, object>
{
["CorrelationId"] = correlationId,
["DeliveryCount"] = deliveryCount,
["QueueName"] = "orders-queue",
["MessageId"] = message.MessageId
});
try
{
_logger.LogInformation("Starting processing attempt {Attempt}", deliveryCount);
var order = JsonSerializer.Deserialize<Order>(message.Body.ToString());
_logger.LogInformation("Processing order {OrderId} for customer {CustomerId}",
order.OrderId, order.CustomerId);
await ProcessOrderInternalAsync(order);
_logger.LogInformation("Order {OrderId} processed successfully", order.OrderId);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Failed to process message {MessageId} on attempt {Attempt}. Error: {Error}",
message.MessageId, deliveryCount, ex.Message);
throw;
}
}
Application Insights Queries
// View recent errors
traces
| where timestamp > ago(1h)
| where severityLevel == 3
| project timestamp, message, customDimensions.FunctionName, customDimensions.CorrelationId
| order by timestamp desc
// View message processing failures
requests
| where timestamp > ago(1h)
| where success == false
| project timestamp, name, success, duration, customDimensions.MessageId
| order by timestamp desc
// View DLQ messages
requests
| where name == "ProcessDeadLetter"
| project timestamp, customProperties.ErrorMessage, customProperties.RetryCount
// Calculate error rate
requests
| where timestamp > ago(24h)
| summarize errorCount = countif(success == false), totalCount = count() by bin(timestamp, 1h)
| project hour = bin(timestamp, 1h), errorRate = todouble(errorCount) * 100 / totalCount
Alert Configuration
# Create alert for DLQ messages
az monitor metrics alert create \
--name dlq-messages-alert \
--resource-group my-rg \
--resource "/subscriptions/xxx/resourceGroups/my-rg/providers/Microsoft.ServiceBus/namespaces/my-namespace/queues/orders-dlq" \
--condition "count > 0" \
--description "DLQ has messages that need attention" \
--action-group ops-alerts
# Create alert for high error rate
az monitor metrics alert create \
--name function-errors-alert \
--resource-group my-rg \
--resource "/subscriptions/xxx/resourceGroups/my-rg/providers/Microsoft.Web/sites/my-function-app" \
--condition "average > 10" \
--metric "FunctionExecutionCount" \
--dimname "Result" "Failed" \
--description "Function has high error rate"
Best Practices
Error Handling Checklist
| Practice | Description |
|---|---|
| Always log errors | Include message ID, context, and stack trace |
| Use exception types | Distinguish between transient and permanent failures |
| Set appropriate retry limits | 3-5 retries for most scenarios |
| Implement DLQ processing | Don't let failed messages disappear |
| Monitor DLQ depth | Alert when DLQ grows large |
Retry Strategy by Error Type
┌─────────────────────────────────────────────────────────────────────┐
│ RETRY STRATEGY BY ERROR TYPE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Network Timeout → Retry with exponential backoff │
│ Service unavailable → Retry with longer delay │
│ Validation failure → Don't retry (needs code fix) │
│ Malformed message → Don't retry (needs message fix) │
│ Business logic error → Retry with limited attempts │
│ Resource quota → Retry with significant backoff │
│ │
└─────────────────────────────────────────────────────────────────────┘
Anti-Patterns to Avoid
// BAD: Swallowing exceptions
try { ProcessMessage(message); }
catch (Exception ex) {
// Do nothing - message disappears!
}
// GOOD: Proper error handling
try { ProcessMessage(message); }
catch (Exception ex) {
_logger.LogError(ex, "Failed to process message");
throw; // Let the runtime handle retry/DLQ
}
// BAD: Catching and rethrowing same exception
catch (Exception ex) {
throw ex; // Loses stack trace
}
// GOOD: Rethrowing properly
catch (Exception ex) {
throw; // Preserves original stack trace
}
// BAD: Logging and swallowing
catch (Exception ex) {
_logger.LogError(ex, "Error");
// No throw - message marked complete incorrectly!
}
// GOOD: Log and throw for proper handling
catch (Exception ex) {
_logger.LogError(ex, "Error");
throw;
}
Related Topics
- Functions Triggers — All trigger types
- Service Bus Topics — Pub/sub patterns
- Managed Identity — Secure authentication
Azure Integration Hub - Intermediate Level