Serverless Error Handling and Retry Strategies
Why Error Handling Matters in Serverless
Azure Functions run in a serverless environment where:
- Infrastructure is invisible - You don't control when functions run or scale
- State is external - Functions are stateless, data lives elsewhere
- Integration is async - Functions often process messages from queues
- Failure is inevitable - Network issues, transient errors happen
Without proper error handling:
- Failed messages disappear or block the queue forever
- No visibility into what went wrong
- Cascading failures bring down entire systems
With proper error handling:
- Automatic retries handle transient failures
- Dead letter queues capture persistent failures for investigation
- Circuit breakers prevent cascade failures
- Monitoring provides visibility into issues
Understanding the Retry Flow
┌─────────────────────────────────────────────────────────────────────────────┐
│ Azure Functions Error Handling Flow │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Message │────▶│ Function │────▶│ Success? │────▶│ Complete │
│ Arrives │ │ Processes │ │ │ │ Message │
└──────────────┘ └──────────────┘ └──────┬───────┘ └──────────────┘
│
┌────────────┴─────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Failed │ │ Failed │
│ (Transient)│ │ (Permanent)│
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Retry N │ │ Dead Letter │
│ times │ │ Queue │
└──────┬───────┘ └──────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌───────────┐ ┌────────────┐
│ Succeed │ │ Exceeded │
│ after │ │ retries │
│ retry │ │ → DLQ │
└───────────┘ └────────────┘
Key Components:
- DeliveryCount: Tracks retry attempts
- MaxDeliveryCount: When to stop retrying
- DeadLetterReason: Why message went to DLQ
- LockDuration: How long each attempt gets
Step 1: Configuring Retry Policies
Service Bus Trigger Configuration
// Function with Service Bus trigger and retry configuration
public class OrderProcessingFunction
{
private readonly IOrderService _orderService;
private readonly ILogger<OrderProcessingFunction> _logger;
public OrderProcessingFunction(
IOrderService orderService,
ILogger<OrderProcessingFunction> logger)
{
_orderService = orderService;
_logger = logger;
}
[Function("ProcessOrder")]
public async Task Run(
[ServiceBusTrigger(
queueName: "orders",
// Connection = "ServiceBusConnection",
// IsSessionsEnabled = false,
// AutoComplete = false // IMPORTANT: Manual completion for retry control
)] ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions)
{
var deliveryCount = message.DeliveryCount;
var maxDeliveryCount = 10; // Default for Service Bus
_logger.LogInformation(
"Processing order {OrderId}, Delivery attempt: {Attempt}/{Max}",
message.MessageId, deliveryCount, maxDeliveryCount);
try
{
// Process the order
var order = message.Body.ToObject<Order>();
await _orderService.ProcessOrderAsync(order);
// Success - complete the message
await messageActions.CompleteMessageAsync(message);
_logger.LogInformation("Successfully processed order {OrderId}", order.Id);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Failed to process order {OrderId}, attempt {Attempt}/{Max}",
message.MessageId, deliveryCount, maxDeliveryCount);
// Decide whether to retry or dead-letter
if (deliveryCount >= maxDeliveryCount)
{
// Max retries exceeded - dead letter
_logger.LogError(
"Max delivery count exceeded for order {OrderId}. Moving to DLQ.",
message.MessageId);
await messageActions.DeadLetterMessageAsync(message, new Dictionary<string, object>
{
["ExceptionType"] = ex.GetType().().Name,
["ExceptionMessage"] = ex.Message,
["LastAttempt"] = DateTime.UtcNow,
["Attempts"] = deliveryCount
});
}
else
{
// Abandon - will be retried by Service Bus
// Setting visibility delay to implement backoff
await messageActions.AbandonMessageAsync(message, new Dictionary<string, object>
{
["RetryAttempt"] = deliveryCount,
["ErrorTime"] = DateTime.UtcNow,
["ErrorMessage"] = ex.Message
});
}
}
}
}
Queue Trigger Configuration
public class QueueProcessingFunction
{
[Function("ProcessQueueItem")]
public async Task Run(
[QueueTrigger("orders-queue")] OrderMessage message,
[Queue("orders-queue")] IAsyncCollector<OrderMessage> retryQueue,
ILogger log)
{
try
{
// Process the message
await ProcessOrderAsync(message);
}
catch (TransientException ex)
{
// For transient errors, implement custom retry with backoff
var retryCount = message.RetryCount ?? 0;
var maxRetries = 3;
if (retryCount >= maxRetries)
{
// Send to poison queue after max retries
log.LogError("Max retries exceeded for message {MessageId}", message.Id);
await retryQueue.AddAsync(new OrderMessage
{
Id = message.Id,
IsPoison = true,
OriginalRetryCount = retryCount,
LastError = ex.Message
});
return;
}
// Exponential backoff
var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
log.LogWarning(
"Transient error processing {MessageId}. Retry {Retry} after {Delay}s",
message.Id, retryCount + 1, delay.TotalSeconds);
// Create retry message with updated count
var retryMessage = new OrderMessage
{
Id = message.Id,
OrderId = message.OrderId,
RetryCount = retryCount + 1,
OriginalQueue = "orders-queue"
};
// Delay before retry (using scheduled message)
await Task.Delay(delay);
await retryQueue.AddAsync(retryMessage);
}
catch (Exception ex)
{
// Permanent error - send to poison queue immediately
log.LogError(ex, "Permanent error processing {MessageId}", message.Id);
await retryQueue.AddAsync(new OrderMessage
{
Id = message.Id,
IsPoison = true,
ErrorMessage = ex.Message,
ErrorType = ex.GetType().Name
});
}
}
private async Task ProcessOrderAsync(OrderMessage message)
{
// Actual processing logic
await Task.Delay(100);
}
}
Step 2: Exponential Backoff Implementation
Why Exponential Backoff?
Why Not Fixed Delay?
Fixed: 1 second delay between retries
────────────────────────────────────────
Request 1 ──▶ ✗ (fail) ──▶ wait 1s ──▶ Request 2 ──▶ ✗ (fail) ──▶ wait 1s ──▶ Request 3 ──▶ ✓
Total time: 3 seconds + processing time
Problem: During service outage, all clients retry at same time
→ Thundering herd → Service overwhelmed → More failures
Exponential Backoff: 1s, 2s, 4s, 8s...
────────────────────────────────────────────
Request 1 ──▶ ✗ (fail) ──▶ wait 1s ──▶ Request 2 ──▶ ✗ (fail) ──▶ wait 2s ──▶ Request 3 ──▶ ✗ (fail) ──▶ wait 4s ──▶ ...
Total time: 7 seconds + processing time
Benefits:
- Gives service time to recover
- Spreads out retry load
- Prevents thundering herd
- Adapts to severity of issue
Implementation
public class ExponentialBackoffService
{
private readonly int _maxRetries;
private readonly TimeSpan _baseDelay;
private readonly TimeSpan _maxDelay;
private readonly double _jitterFactor;
private readonly ILogger<ExponentialBackoffService> _logger;
public ExponentialBackoffService(
int maxRetries = 3,
TimeSpan baseDelay = null,
TimeSpan maxDelay = null,
double jitterFactor = 0.2,
ILogger<ExponentialBackoffService> logger = null)
{
_maxRetries = maxRetries;
_baseDelay = baseDelay ?? TimeSpan.FromSeconds(1);
_maxDelay = maxDelay ?? TimeSpan.FromSeconds(30);
_jitterFactor = jitterFactor;
_logger = logger;
}
public async Task<T> ExecuteWithRetryAsync<T>(
Func<Task<T>> operation,
Func<Exception, bool> shouldRetry = null)
{
var attempt = 0;
Exception lastException = null;
while (attempt < _maxRetries)
{
try
{
return await operation();
}
catch (Exception ex)
{
lastException = ex;
// Check if we should retry this exception
if (shouldRetry != null && !shouldRetry(ex))
{
_logger?.LogInformation(
"Exception {ExceptionType} should not be retried",
ex.GetType().Name);
throw;
}
attempt++;
if (attempt >= _maxRetries)
{
_logger?.LogError(
"Max retries ({MaxRetries}) exceeded. Last error: {Message}",
_maxRetries, ex.Message);
throw;
}
// Calculate delay with exponential backoff
var delay = CalculateDelay(attempt);
_logger?.LogWarning(
"Attempt {Attempt}/{MaxRetries} failed. Retrying in {Delay}s. Error: {Message}",
attempt, _maxRetries, delay.TotalSeconds, ex.Message);
await Task.Delay(delay);
}
}
throw lastException;
}
public async Task ExecuteWithRetryAsync(
Func<Task> operation,
Func<Exception, bool> shouldRetry = null)
{
await ExecuteWithRetryAsync(async () =>
{
await operation();
return true;
}, shouldRetry);
}
private TimeSpan CalculateDelay(int attempt)
{
// Exponential: delay = baseDelay * 2^(attempt-1)
var exponentialDelay = _baseDelay * Math.Pow(2, attempt - 1);
// Add jitter to prevent synchronized retries
var jitter = TimeSpan.FromTicks(
(long)(exponentialDelay.Ticks * _jitterFactor * (new Random().NextDouble() * 2 - 1)));
var totalDelay = exponentialDelay + jitter;
// Cap at max delay
return totalDelay > _maxDelay ? _maxDelay : totalDelay;
}
}
// Usage in Azure Function
public class OrderProcessingWithBackoff
{
private readonly ExponentialBackoffService _backoffService;
private readonly IOrderService _orderService;
public OrderProcessingWithBackoff(
ExponentialBackoffService backoffService,
IOrderService orderService)
{
_backoffService = backoffService;
_orderService = orderService;
}
[Function("ProcessOrderWithBackoff")]
public async Task Run([ServiceBusTrigger("orders")] ServiceBusReceivedMessage message)
{
var order = message.Body.ToObject<Order>();
// Only retry transient exceptions
await _backoffService.ExecuteWithRetryAsync(
async () => await _orderService.ProcessOrderAsync(order),
ex => IsTransientException(ex)
);
// If we reach here, processing succeeded
}
private bool IsTransientException(Exception ex)
{
return ex is TimeoutException ||
ex is HttpRequestException ||
ex is TaskCanceledException ||
(ex is ServiceBusException sbEx && sbEx.Reason == ServiceBusFailureReason.ServiceTimeout);
}
}
Step 3: Dead Letter Queue Handling
Processing Dead Letters
public class DeadLetterProcessor
{
private readonly ServiceBusProcessor _dlqProcessor;
private readonly ITableClient _deadLetterTable;
private readonly IAlertService _alertService;
private readonly ILogger<DeadLetterProcessor> _logger;
public DeadLetterProcessor(
ServiceBusClient client,
ITableClient deadLetterTable,
IAlertService alertService,
ILogger<DeadLetterProcessor> logger)
{
_alertService = alertService;
_logger = logger;
// Create processor for dead letter queue
_dlqProcessor = client.CreateProcessor(
"orders",
new ServiceBusProcessorOptions
{
SubQueue = SubQueue.DeadLetter,
MaxConcurrentCalls = 5,
AutoComplete = false
});
}
public async Task StartProcessingAsync()
{
_dlqProcessor.ProcessMessageAsync += HandleDeadLetterAsync;
_dlqProcessor.ProcessErrorAsync += HandleErrorAsync;
await _dlqProcessor.StartProcessingAsync();
_logger.LogInformation("Dead letter processor started");
}
private async Task HandleDeadLetterAsync(ProcessMessageEventArgs args)
{
var message = args.Message;
try
{
// Extract error details
var deadLetterReason = message.DeadLetterReason;
var errorMessage = message.DeadLetterErrorDescription;
var deliveryCount = message.DeliveryCount;
// Extract custom properties
var eventType = message.ApplicationProperties["EventType"]?.ToString();
var orderId = message.ApplicationProperties["OrderId"]?.ToString();
_logger.LogWarning(
"Processing dead letter: OrderId={OrderId}, Reason={Reason}, " +
"Attempts={Attempts}, Error={Error}",
orderId, deadLetterReason, deliveryCount, errorMessage);
// Determine action based on error type
var action = DetermineAction(deadLetterReason, errorMessage, deliveryCount);
switch (action)
{
case DeadLetterAction.RetryNow:
// Re-process immediately (for recoverable errors)
await RetryMessageAsync(message, args.MessageActions);
break;
case DeadLetterAction.Reschedule:
// Re-queue for later processing
await RescheduleMessageAsync(message, args.MessageActions);
break;
case DeadLetterAction.ArchiveAndNotify:
// Archive to storage and alert team
await ArchiveAndNotifyAsync(message, orderId);
await args.CompleteMessageAsync(message);
break;
case DeadLetterAction.Discard:
// No action needed - message already handled elsewhere
_logger.LogInformation("Discarding dead letter for order {OrderId}", orderId);
await args.CompleteMessageAsync(message);
break;
}
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process dead letter");
await args.AbandonMessageAsync(message);
}
}
private DeadLetterAction DetermineAction(
string reason,
string errorMessage,
int deliveryCount)
{
// Analyze the error and determine appropriate action
// Transient errors - retry now
if (reason == "MessageLockLost" || reason == "ServiceTimeout")
{
return DeadLetterAction.RetryNow;
}
// Queue full or service busy - reschedule
if (reason == "QuotaExceeded" || reason == "ServiceBusy")
{
return DeadLetterAction.Reschedule;
}
// Permanent errors - archive and notify
if (reason == "MessageExceededMaxDeliveryCount" ||
reason == "MessageExpired")
{
return DeadLetterAction.ArchiveAndNotify;
}
// If too many retries already, archive
if (deliveryCount >= 5)
{
return DeadLetterAction.ArchiveAndNotify;
}
// Default: try again once
return DeadLetterAction.RetryNow;
}
private async Task RetryMessageAsync(
ServiceBusMessage message,
ServiceBusMessageActions actions)
{
// Copy message to main queue for retry
var retryMessage = new ServiceBusMessage(message)
{
ApplicationProperties =
{
["IsRetry"] = true,
["OriginalDeadLetterReason"] = message.DeadLetterReason,
["RetryTime"] = DateTime.UtcNow
}
};
var sender = new ServiceBusClient("connection").CreateSender("orders");
await sender.SendMessageAsync(retryMessage);
_logger.LogInformation("Retried message from DLQ");
}
private async Task ArchiveAndNotifyAsync(
ServiceBusMessage message,
string orderId)
{
// Archive to table storage
var archiveEntity = new TableEntity
{
PartitionKey = DateTime.UtcNow.ToString("yyyy-MM"),
RowKey = orderId ?? message.MessageId,
["DeadLetterReason"] = message.DeadLetterReason,
["ErrorDescription"] = message.DeadLetterErrorDescription,
["DeliveryCount"] = message.DeliveryCount,
["Body"] = message.Body.ToString(),
["ArchivedAt"] = DateTime.UtcNow,
["OriginalQueue"] = "orders"
};
await _deadLetterTable.AddEntityAsync(archiveEntity);
// Alert the team
await _alertService.SendAlertAsync(
$"Dead Letter: {orderId}",
$"Order {orderId} failed after {message.DeliveryCount} attempts. " +
$"Reason: {message.DeadLetterReason}");
}
}
public enum DeadLetterAction
{
RetryNow,
Reschedule,
ArchiveAndNotify,
Discard
}
Step 4: Circuit Breaker Pattern
Why Circuit Breakers?
Without Circuit Breaker:
┌─────────────────────────────────────────────────────────────┐
│ External Service Failures → Cascading Failure │
│ │
│ Function ──▶ External API ──▶ FAILS │
│ │ │
│ │ (still calling failing service) │
│ ▼ │
│ Function also fails → Queue fills up → System dies │
└─────────────────────────────────────────────────────────────┘
With Circuit Breaker:
┌─────────────────────────────────────────────────────────────┐
│ Circuit Opens → Fast Fail → System Survives │
│ │
│ Function ──▶ Circuit ──▶ API │
│ │ │ │
│ │ │ (too many failures) │
│ │ ▼ │
│ │ Circuit Opens │
│ │ │ │
│ │ Returns fallback immediately │
│ │ │ │
│ ▼ ▼ │
│ Returns cached/default value │
│ │
│ After timeout, circuit half-opens, tests, closes │
└─────────────────────────────────────────────────────────────┘
Implementation
public class CircuitBreaker
{
private readonly object _lock = new();
private int _failureCount;
private int _successCount;
private DateTime _lastFailureTime;
private CircuitState _state = CircuitState.Closed;
private readonly int _failureThreshold;
private readonly int _successThreshold;
private readonly TimeSpan _timeout;
private readonly TimeSpan _samplingWindow;
public CircuitBreaker(
int failureThreshold = 5,
int successThreshold = 2,
TimeSpan timeout = null,
TimeSpan samplingWindow = null)
{
_failureThreshold = failureThreshold;
_successThreshold = successThreshold;
_timeout = timeout ?? TimeSpan.FromSeconds(30);
_samplingWindow = samplingWindow ?? TimeSpan.FromSeconds(60);
}
public bool CanExecute => _state != CircuitState.Open;
public async Task<T> ExecuteAsync<T>(Func<Task<T>> operation)
{
if (_state == CircuitState.Open)
{
// Check if we should try half-open
if (DateTime.UtcNow - _lastFailureTime > _timeout)
{
_state = CircuitState.HalfOpen;
}
else
{
throw new CircuitOpenException(
$"Circuit is open. Failures: {_failureCount}, Last failure: {_lastFailureTime}");
}
}
try
{
var result = await operation();
RecordSuccess();
return result;
}
catch (Exception ex)
{
RecordFailure();
throw;
}
}
public async Task ExecuteAsync(Func<Task> operation)
{
await ExecuteAsync(async () =>
{
await operation();
return true;
});
}
private void RecordSuccess()
{
lock (_lock)
{
_failureCount = 0;
_successCount++;
if (_state == CircuitState.HalfOpen && _successCount >= _successThreshold)
{
_logger?.LogInformation("Circuit breaker closing after {SuccessCount} successes",
_successCount);
_state = CircuitState.Closed;
_successCount = 0;
}
}
}
private void RecordFailure()
{
lock (_lock)
{
_failureCount++;
_lastFailureTime = DateTime.UtcNow;
_successCount = 0;
if (_state == CircuitState.HalfOpen)
{
_logger?.LogWarning("Circuit breaker reopening after half-open failure");
_state = CircuitState.Open;
}
else if (_failureCount >= _failureThreshold)
{
_logger?.LogWarning(
"Circuit breaker opening after {FailureCount} failures",
_failureCount);
_state = CircuitState.Open;
}
}
}
}
// Usage in Azure Function
public class OrderProcessingWithCircuitBreaker
{
private readonly CircuitBreaker _paymentCircuitBreaker;
private readonly IPaymentService _paymentService;
private readonly ICacheService _cacheService;
private readonly ILogger _logger;
[Function("ProcessOrderWithCircuitBreaker")]
public async Task Run([ServiceBusTrigger("orders")] ServiceBusReceivedMessage message)
{
var order = message.Body.ToObject<Order>();
try
{
// Try payment with circuit breaker
await _paymentCircuitBreaker.ExecuteAsync(async () =>
{
await _paymentService.ProcessPaymentAsync(order.PaymentInfo);
});
}
catch (CircuitOpenException)
{
// Circuit is open - use fallback
_logger.LogWarning(
"Payment service circuit open. Using fallback payment handling");
// Queue for manual payment processing
await QueueManualPaymentAsync(order);
}
// Continue with order processing
await _orderService.ConfirmOrderAsync(order);
}
}
Step 5: Custom Retry with Visibility Timeout
public class CustomRetryFunction
{
private readonly ServiceBusClient _client;
private readonly ILogger<CustomRetryFunction> _logger;
public CustomRetryFunction(ServiceBusClient client, ILogger<CustomRetryFunction> logger)
{
_client = client;
_logger = logger;
}
[Function("ProcessWithCustomRetry")]
public async Task Run(
[ServiceBusTrigger("orders", AutoComplete = false)] ServiceBusReceivedMessage message,
ServiceBusMessageActions actions)
{
var retryCount = (int)(message.ApplicationProperties.GetValueOrDefault("RetryCount", 0));
var maxRetries = 3;
try
{
await ProcessOrderAsync(message.Body.ToObject<Order>());
await actions.CompleteMessageAsync(message);
}
catch (Exception ex) when (IsTransient(ex))
{
if (retryCount >= maxRetries)
{
_logger.LogError("Max retries exceeded, dead-lettering");
await actions.DeadLetterMessageAsync(message, new Dictionary<string, object>
{
["LastError"] = ex.Message,
["RetryCount"] = retryCount
});
return;
}
_logger.LogWarning("Transient error, retry {Retry}/{Max}", retryCount + 1, maxRetries);
// Calculate delay and schedule re-delivery
var delay = CalculateDelay(retryCount);
// Create new message with incremented retry count
var retryMessage = new ServiceBusMessage(message)
{
ScheduledEnqueueTime = DateTimeOffset.UtcNow.Add(delay),
ApplicationProperties =
{
["RetryCount"] = retryCount + 1,
["OriginalMessageId"] = message.MessageId,
["LastError"] = ex.Message
}
};
// Send to same queue - will be picked up after delay
var sender = _client.CreateSender("orders");
await sender.SendMessageAsync(retryMessage);
// Complete original message
await actions.CompleteMessageAsync(message);
}
}
private bool IsTransient(Exception ex)
{
return ex is TimeoutException ||
ex is HttpRequestException ||
ex is IOException ||
(ex as ServiceBusException)?.IsTransient == true;
}
private TimeSpan CalculateDelay(int retryCount)
{
// 10s, 30s, 60s...
return TimeSpan.FromSeconds(10 * retryCount * retryCount);
}
}
Step 6: Error Handling Best Practices
Complete Error Handling Pattern
public class ResilientFunction
{
private readonly ILogger<ResilientFunction> _logger;
private readonly IMetricsCollector _metrics;
public ResilientFunction(
ILogger<ResilientFunction> logger,
IMetricsCollector metrics)
{
_logger = logger;
_metrics = metrics;
}
[Function("ResilientOrderProcessing")]
public async Task Run(
[ServiceBusTrigger("orders", AutoComplete = false)]
ServiceBusReceivedMessage message,
ServiceBusMessageActions actions)
{
var correlationId = message.CorrelationId;
var deliveryCount = message.DeliveryCount;
using var scope = _logger.BeginScope("Processing order {CorrelationId}, Attempt {Attempt}",
correlationId, deliveryCount);
try
{
// Track start time
var startTime = DateTime.UtcNow;
// Process
await ProcessOrderAsync(message.Body.ToObject<Order>());
// Track success metrics
var duration = DateTime.UtcNow - startTime;
_metrics.RecordSuccess("orders", duration);
await actions.CompleteMessageAsync(message);
}
catch (ValidationException ex)
{
// Permanent failure - don't retry
_logger.LogError(ex, "Validation failed for order {CorrelationId}. Not retrying.",
correlationId);
_metrics.RecordFailure("orders", "validation");
await actions.DeadLetterMessageAsync(message, new Dictionary<string, object>
{
["ErrorType"] = "Validation",
["ErrorMessage"] = ex.Message
});
}
catch (TransientException ex) when (deliveryCount < 5)
{
// Transient failure - retry with backoff
_logger.LogWarning(ex, "Transient error processing order {CorrelationId}, attempt {Attempt}",
correlationId, deliveryCount);
_metrics.RecordFailure("orders", "transient");
var delay = TimeSpan.FromSeconds(Math.Pow(2, deliveryCount));
await actions.AbandonMessageAsync(message, new Dictionary<string, object>
{
["AbandonedAt"] = DateTime.UtcNow,
["RetryDelay"] = delay
});
}
catch (Exception ex) when (deliveryCount >= 5)
{
// Max retries exceeded
_logger.LogError(ex, "Max retries exceeded for order {CorrelationId}",
correlationId);
_metrics.RecordFailure("orders", "exhausted");
await actions.DeadLetterMessageAsync(message, new Dictionary<string, object>
{
["ErrorType"] = ex.GetType().Name,
["ErrorMessage"] = ex.Message,
["FinalAttempt"] = deliveryCount
});
}
catch (Exception ex)
{
// Unexpected error
_logger.LogError(ex, "Unexpected error processing order {CorrelationId}",
correlationId);
_metrics.RecordFailure("orders", "unexpected");
throw; // Let host handle
}
}
private async Task ProcessOrderAsync(Order order)
{
// Your business logic
await Task.Delay(100);
}
}
Monitoring and Alerts
public class ErrorMonitoringService
{
private readonly ILogger<ErrorMonitoringService> _logger;
private readonly IMetricsClient _metricsClient;
public async Task CheckAndAlertAsync()
{
// Get metrics from your monitoring system
var recentErrors = await _metricsClient.GetErrorsAsync(
TimeSpan.FromMinutes(15));
var errorRate = recentErrors.Total / recentErrors.TotalRequests;
if (errorRate > 0.1) // >10% error rate
{
await SendAlertAsync(
$"High Error Rate: {errorRate:P1}",
$"Errors in last 15 minutes: {recentErrors.Total}. " +
$"Breakdown: {string.Join(", ", recentErrors.ByType)}");
}
// Check DLQ depth
var dlqDepth = await GetQueueDepthAsync("orders/$dead-letter-queue");
if (dlqDepth > 100)
{
await SendAlertAsync(
$"DLQ Backlog: {dlqDepth} messages",
"Dead letter queue has accumulated messages. Investigation required.");
}
}
private async Task SendAlertAsync(string title, string message)
{
// Send to your alerting system
_logger.LogCritical("{Title}: {Message}", title, message);
}
}
Best Practices Summary
| Practice | Why | Implementation |
|---|---|---|
| Distinguish error types | Different errors need different handling | Transient vs Permanent |
| Use exponential backoff | Prevent thundering herd | 1s, 2s, 4s, 8s pattern |
| Set max delivery count | Prevent infinite retries | 3-5 for most cases |
| Implement circuit breakers | Prevent cascade failures | Open → Half-open → Closed |
| Log extensively | Debugging is critical | Include correlation IDs |
| Monitor error rates | Early warning system | Set up alerts |
Testing Error Handling
[Fact]
public async Task Retry_ExponentialBackoff_Works()
{
// Arrange
var callCount = 0;
var delays = new List<TimeSpan>();
async Task OperationWithDelay()
{
callCount++;
delays.Add(_backoffService.CurrentDelay); // Track delays
throw new TransientException("Test error");
}
// Act & Assert
await Assert.ThrowsAsync<TransientException>(() =>
_backoffService.ExecuteWithRetryAsync(
OperationWithDelay,
ex => true));
Assert.Equal(3, callCount);
Assert.True(delays[1] > delays[0]); // Each delay increases
}
Conclusion
Robust error handling is essential for serverless applications:
- Automatic retries handle transient failures gracefully
- Dead letter queues capture persistent failures for investigation
- Exponential backoff prevents overwhelming downstream services
- Circuit breakers prevent cascade failures
- Comprehensive logging enables debugging
Key takeaways:
- Distinguish transient from permanent errors
- Implement exponential backoff, not fixed delays
- Set reasonable max delivery counts (3-5)
- Use circuit breakers for external dependencies
- Monitor error rates and DLQ depth
- Test your error handling thoroughly
Azure Integration Hub - Functions