← Back to ArticlesFunctions

Serverless Error Handling and Retry Strategies

Comprehensive guide to implementing resilient error handling in Azure Functions with dead-letter triggers, exponential backoff, and retry patterns

Serverless Error Handling and Retry Strategies

Why Error Handling Matters in Serverless

Azure Functions run in a serverless environment where:

Without proper error handling:

With proper error handling:


Understanding the Retry Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Azure Functions Error Handling Flow                      │
└─────────────────────────────────────────────────────────────────────────────┘

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Message    │────▶│   Function   │────▶│   Success?   │────▶│  Complete    │
│   Arrives    │     │   Processes  │     │              │     │  Message     │
└──────────────┘     └──────────────┘     └──────┬───────┘     └──────────────┘
                                                 │
                                    ┌────────────┴─────────────┐
                                    ▼                          ▼
                             ┌──────────────┐              ┌──────────────┐
                             │    Failed    │              │    Failed    │
                             │   (Transient)│              │   (Permanent)│
                             └──────┬───────┘              └──────┬───────┘
                                    │                             │
                                    ▼                             ▼
                             ┌──────────────┐              ┌──────────────┐
                             │   Retry N    │              │  Dead Letter │
                             │   times      │              │    Queue     │
                             └──────┬───────┘              └──────────────┘
                                    │
                          ┌─────────┴─────────┐
                          ▼                   ▼
                   ┌───────────┐       ┌────────────┐
                   │  Succeed  │       │  Exceeded  │
                   │   after   │       │   retries  │
                   │   retry   │       │  → DLQ     │
                   └───────────┘       └────────────┘


Key Components:
- DeliveryCount: Tracks retry attempts
- MaxDeliveryCount: When to stop retrying
- DeadLetterReason: Why message went to DLQ
- LockDuration: How long each attempt gets

Step 1: Configuring Retry Policies

Service Bus Trigger Configuration

// Function with Service Bus trigger and retry configuration
public class OrderProcessingFunction
{
    private readonly IOrderService _orderService;
    private readonly ILogger<OrderProcessingFunction> _logger;

    public OrderProcessingFunction(
        IOrderService orderService,
        ILogger<OrderProcessingFunction> logger)
    {
        _orderService = orderService;
        _logger = logger;
    }

    [Function("ProcessOrder")]
    public async Task Run(
        [ServiceBusTrigger(
            queueName: "orders",
            // Connection = "ServiceBusConnection",
            // IsSessionsEnabled = false,
            // AutoComplete = false  // IMPORTANT: Manual completion for retry control
        )] ServiceBusReceivedMessage message,
        ServiceBusMessageActions messageActions)
    {
        var deliveryCount = message.DeliveryCount;
        var maxDeliveryCount = 10; // Default for Service Bus

        _logger.LogInformation(
            "Processing order {OrderId}, Delivery attempt: {Attempt}/{Max}",
            message.MessageId, deliveryCount, maxDeliveryCount);

        try
        {
            // Process the order
            var order = message.Body.ToObject<Order>();
            await _orderService.ProcessOrderAsync(order);

            // Success - complete the message
            await messageActions.CompleteMessageAsync(message);
            
            _logger.LogInformation("Successfully processed order {OrderId}", order.Id);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, 
                "Failed to process order {OrderId}, attempt {Attempt}/{Max}",
                message.MessageId, deliveryCount, maxDeliveryCount);

            // Decide whether to retry or dead-letter
            if (deliveryCount >= maxDeliveryCount)
            {
                // Max retries exceeded - dead letter
                _logger.LogError(
                    "Max delivery count exceeded for order {OrderId}. Moving to DLQ.",
                    message.MessageId);

                await messageActions.DeadLetterMessageAsync(message, new Dictionary<string, object>
                {
                    ["ExceptionType"] = ex.GetType().().Name,
                    ["ExceptionMessage"] = ex.Message,
                    ["LastAttempt"] = DateTime.UtcNow,
                    ["Attempts"] = deliveryCount
                });
            }
            else
            {
                // Abandon - will be retried by Service Bus
                // Setting visibility delay to implement backoff
                await messageActions.AbandonMessageAsync(message, new Dictionary<string, object>
                {
                    ["RetryAttempt"] = deliveryCount,
                    ["ErrorTime"] = DateTime.UtcNow,
                    ["ErrorMessage"] = ex.Message
                });
            }
        }
    }
}

Queue Trigger Configuration

public class QueueProcessingFunction
{
    [Function("ProcessQueueItem")]
    public async Task Run(
        [QueueTrigger("orders-queue")] OrderMessage message,
        [Queue("orders-queue")] IAsyncCollector<OrderMessage> retryQueue,
        ILogger log)
    {
        try
        {
            // Process the message
            await ProcessOrderAsync(message);
        }
        catch (TransientException ex)
        {
            // For transient errors, implement custom retry with backoff
            
            var retryCount = message.RetryCount ?? 0;
            var maxRetries = 3;
            
            if (retryCount >= maxRetries)
            {
                // Send to poison queue after max retries
                log.LogError("Max retries exceeded for message {MessageId}", message.Id);
                
                await retryQueue.AddAsync(new OrderMessage
                {
                    Id = message.Id,
                    IsPoison = true,
                    OriginalRetryCount = retryCount,
                    LastError = ex.Message
                });
                
                return;
            }

            // Exponential backoff
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            log.LogWarning(
                "Transient error processing {MessageId}. Retry {Retry} after {Delay}s",
                message.Id, retryCount + 1, delay.TotalSeconds);

            // Create retry message with updated count
            var retryMessage = new OrderMessage
            {
                Id = message.Id,
                OrderId = message.OrderId,
                RetryCount = retryCount + 1,
                OriginalQueue = "orders-queue"
            };

            // Delay before retry (using scheduled message)
            await Task.Delay(delay);
            await retryQueue.AddAsync(retryMessage);
        }
        catch (Exception ex)
        {
            // Permanent error - send to poison queue immediately
            log.LogError(ex, "Permanent error processing {MessageId}", message.Id);
            
            await retryQueue.AddAsync(new OrderMessage
            {
                Id = message.Id,
                IsPoison = true,
                ErrorMessage = ex.Message,
                ErrorType = ex.GetType().Name
            });
        }
    }

    private async Task ProcessOrderAsync(OrderMessage message)
    {
        // Actual processing logic
        await Task.Delay(100);
    }
}

Step 2: Exponential Backoff Implementation

Why Exponential Backoff?

Why Not Fixed Delay?

Fixed: 1 second delay between retries
────────────────────────────────────────
Request 1 ──▶ ✗ (fail) ──▶ wait 1s ──▶ Request 2 ──▶ ✗ (fail) ──▶ wait 1s ──▶ Request 3 ──▶ ✓

Total time: 3 seconds + processing time

Problem: During service outage, all clients retry at same time
→ Thundering herd → Service overwhelmed → More failures


Exponential Backoff: 1s, 2s, 4s, 8s...
────────────────────────────────────────────
Request 1 ──▶ ✗ (fail) ──▶ wait 1s ──▶ Request 2 ──▶ ✗ (fail) ──▶ wait 2s ──▶ Request 3 ──▶ ✗ (fail) ──▶ wait 4s ──▶ ...

Total time: 7 seconds + processing time

Benefits:
- Gives service time to recover
- Spreads out retry load
- Prevents thundering herd
- Adapts to severity of issue

Implementation

public class ExponentialBackoffService
{
    private readonly int _maxRetries;
    private readonly TimeSpan _baseDelay;
    private readonly TimeSpan _maxDelay;
    private readonly double _jitterFactor;
    private readonly ILogger<ExponentialBackoffService> _logger;

    public ExponentialBackoffService(
        int maxRetries = 3,
        TimeSpan baseDelay = null,
        TimeSpan maxDelay = null,
        double jitterFactor = 0.2,
        ILogger<ExponentialBackoffService> logger = null)
    {
        _maxRetries = maxRetries;
        _baseDelay = baseDelay ?? TimeSpan.FromSeconds(1);
        _maxDelay = maxDelay ?? TimeSpan.FromSeconds(30);
        _jitterFactor = jitterFactor;
        _logger = logger;
    }

    public async Task<T> ExecuteWithRetryAsync<T>(
        Func<Task<T>> operation,
        Func<Exception, bool> shouldRetry = null)
    {
        var attempt = 0;
        Exception lastException = null;

        while (attempt < _maxRetries)
        {
            try
            {
                return await operation();
            }
            catch (Exception ex)
            {
                lastException = ex;

                // Check if we should retry this exception
                if (shouldRetry != null && !shouldRetry(ex))
                {
                    _logger?.LogInformation(
                        "Exception {ExceptionType} should not be retried",
                        ex.GetType().Name);
                    throw;
                }

                attempt++;
                
                if (attempt >= _maxRetries)
                {
                    _logger?.LogError(
                        "Max retries ({MaxRetries}) exceeded. Last error: {Message}",
                        _maxRetries, ex.Message);
                    throw;
                }

                // Calculate delay with exponential backoff
                var delay = CalculateDelay(attempt);
                
                _logger?.LogWarning(
                    "Attempt {Attempt}/{MaxRetries} failed. Retrying in {Delay}s. Error: {Message}",
                    attempt, _maxRetries, delay.TotalSeconds, ex.Message);

                await Task.Delay(delay);
            }
        }

        throw lastException;
    }

    public async Task ExecuteWithRetryAsync(
        Func<Task> operation,
        Func<Exception, bool> shouldRetry = null)
    {
        await ExecuteWithRetryAsync(async () =>
        {
            await operation();
            return true;
        }, shouldRetry);
    }

    private TimeSpan CalculateDelay(int attempt)
    {
        // Exponential: delay = baseDelay * 2^(attempt-1)
        var exponentialDelay = _baseDelay * Math.Pow(2, attempt - 1);
        
        // Add jitter to prevent synchronized retries
        var jitter = TimeSpan.FromTicks(
            (long)(exponentialDelay.Ticks * _jitterFactor * (new Random().NextDouble() * 2 - 1)));
        
        var totalDelay = exponentialDelay + jitter;
        
        // Cap at max delay
        return totalDelay > _maxDelay ? _maxDelay : totalDelay;
    }
}

// Usage in Azure Function
public class OrderProcessingWithBackoff
{
    private readonly ExponentialBackoffService _backoffService;
    private readonly IOrderService _orderService;

    public OrderProcessingWithBackoff(
        ExponentialBackoffService backoffService,
        IOrderService orderService)
    {
        _backoffService = backoffService;
        _orderService = orderService;
    }

    [Function("ProcessOrderWithBackoff")]
    public async Task Run([ServiceBusTrigger("orders")] ServiceBusReceivedMessage message)
    {
        var order = message.Body.ToObject<Order>();

        // Only retry transient exceptions
        await _backoffService.ExecuteWithRetryAsync(
            async () => await _orderService.ProcessOrderAsync(order),
            ex => IsTransientException(ex)
        );

        // If we reach here, processing succeeded
    }

    private bool IsTransientException(Exception ex)
    {
        return ex is TimeoutException ||
               ex is HttpRequestException ||
               ex is TaskCanceledException ||
               (ex is ServiceBusException sbEx && sbEx.Reason == ServiceBusFailureReason.ServiceTimeout);
    }
}

Step 3: Dead Letter Queue Handling

Processing Dead Letters

public class DeadLetterProcessor
{
    private readonly ServiceBusProcessor _dlqProcessor;
    private readonly ITableClient _deadLetterTable;
    private readonly IAlertService _alertService;
    private readonly ILogger<DeadLetterProcessor> _logger;

    public DeadLetterProcessor(
        ServiceBusClient client,
        ITableClient deadLetterTable,
        IAlertService alertService,
        ILogger<DeadLetterProcessor> logger)
    {
        _alertService = alertService;
        _logger = logger;
        
        // Create processor for dead letter queue
        _dlqProcessor = client.CreateProcessor(
            "orders",
            new ServiceBusProcessorOptions
            {
                SubQueue = SubQueue.DeadLetter,
                MaxConcurrentCalls = 5,
                AutoComplete = false
            });
    }

    public async Task StartProcessingAsync()
    {
        _dlqProcessor.ProcessMessageAsync += HandleDeadLetterAsync;
        _dlqProcessor.ProcessErrorAsync += HandleErrorAsync;

        await _dlqProcessor.StartProcessingAsync();
        
        _logger.LogInformation("Dead letter processor started");
    }

    private async Task HandleDeadLetterAsync(ProcessMessageEventArgs args)
    {
        var message = args.Message;
        
        try
        {
            // Extract error details
            var deadLetterReason = message.DeadLetterReason;
            var errorMessage = message.DeadLetterErrorDescription;
            var deliveryCount = message.DeliveryCount;
            
            // Extract custom properties
            var eventType = message.ApplicationProperties["EventType"]?.ToString();
            var orderId = message.ApplicationProperties["OrderId"]?.ToString();

            _logger.LogWarning(
                "Processing dead letter: OrderId={OrderId}, Reason={Reason}, " +
                "Attempts={Attempts}, Error={Error}",
                orderId, deadLetterReason, deliveryCount, errorMessage);

            // Determine action based on error type
            var action = DetermineAction(deadLetterReason, errorMessage, deliveryCount);

            switch (action)
            {
                case DeadLetterAction.RetryNow:
                    // Re-process immediately (for recoverable errors)
                    await RetryMessageAsync(message, args.MessageActions);
                    break;

                case DeadLetterAction.Reschedule:
                    // Re-queue for later processing
                    await RescheduleMessageAsync(message, args.MessageActions);
                    break;

                case DeadLetterAction.ArchiveAndNotify:
                    // Archive to storage and alert team
                    await ArchiveAndNotifyAsync(message, orderId);
                    await args.CompleteMessageAsync(message);
                    break;

                case DeadLetterAction.Discard:
                    // No action needed - message already handled elsewhere
                    _logger.LogInformation("Discarding dead letter for order {OrderId}", orderId);
                    await args.CompleteMessageAsync(message);
                    break;
            }
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Failed to process dead letter");
            await args.AbandonMessageAsync(message);
        }
    }

    private DeadLetterAction DetermineAction(
        string reason, 
        string errorMessage,
        int deliveryCount)
    {
        // Analyze the error and determine appropriate action

        // Transient errors - retry now
        if (reason == "MessageLockLost" || reason == "ServiceTimeout")
        {
            return DeadLetterAction.RetryNow;
        }

        // Queue full or service busy - reschedule
        if (reason == "QuotaExceeded" || reason == "ServiceBusy")
        {
            return DeadLetterAction.Reschedule;
        }

        // Permanent errors - archive and notify
        if (reason == "MessageExceededMaxDeliveryCount" || 
            reason == "MessageExpired")
        {
            return DeadLetterAction.ArchiveAndNotify;
        }

        // If too many retries already, archive
        if (deliveryCount >= 5)
        {
            return DeadLetterAction.ArchiveAndNotify;
        }

        // Default: try again once
        return DeadLetterAction.RetryNow;
    }

    private async Task RetryMessageAsync(
        ServiceBusMessage message,
        ServiceBusMessageActions actions)
    {
        // Copy message to main queue for retry
        var retryMessage = new ServiceBusMessage(message)
        {
            ApplicationProperties =
            {
                ["IsRetry"] = true,
                ["OriginalDeadLetterReason"] = message.DeadLetterReason,
                ["RetryTime"] = DateTime.UtcNow
            }
        };

        var sender = new ServiceBusClient("connection").CreateSender("orders");
        await sender.SendMessageAsync(retryMessage);
        
        _logger.LogInformation("Retried message from DLQ");
    }

    private async Task ArchiveAndNotifyAsync(
        ServiceBusMessage message,
        string orderId)
    {
        // Archive to table storage
        var archiveEntity = new TableEntity
        {
            PartitionKey = DateTime.UtcNow.ToString("yyyy-MM"),
            RowKey = orderId ?? message.MessageId,
            ["DeadLetterReason"] = message.DeadLetterReason,
            ["ErrorDescription"] = message.DeadLetterErrorDescription,
            ["DeliveryCount"] = message.DeliveryCount,
            ["Body"] = message.Body.ToString(),
            ["ArchivedAt"] = DateTime.UtcNow,
            ["OriginalQueue"] = "orders"
        };

        await _deadLetterTable.AddEntityAsync(archiveEntity);

        // Alert the team
        await _alertService.SendAlertAsync(
            $"Dead Letter: {orderId}",
            $"Order {orderId} failed after {message.DeliveryCount} attempts. " +
            $"Reason: {message.DeadLetterReason}");
    }
}

public enum DeadLetterAction
{
    RetryNow,
    Reschedule,
    ArchiveAndNotify,
    Discard
}

Step 4: Circuit Breaker Pattern

Why Circuit Breakers?

Without Circuit Breaker:

┌─────────────────────────────────────────────────────────────┐
│  External Service Failures → Cascading Failure              │
│                                                             │
│   Function ──▶ External API ──▶ FAILS                       │
│       │                                                     │
│       │ (still calling failing service)                     │
│       ▼                                                     │
│   Function also fails → Queue fills up → System dies        │
└─────────────────────────────────────────────────────────────┘

With Circuit Breaker:

┌─────────────────────────────────────────────────────────────┐
│  Circuit Opens → Fast Fail → System Survives                │
│                                                             │
│   Function ──▶ Circuit ──▶ API                              │
│       │              │                                      │
│       │              │ (too many failures)                  │
│       │              ▼                                      │
│       │         Circuit Opens                               │
│       │              │                                      │
│       │         Returns fallback immediately                │
│       │              │                                      │
│       ▼              ▼                                      │
│   Returns cached/default value                              │
│                                                             │
│   After timeout, circuit half-opens, tests, closes          │
└─────────────────────────────────────────────────────────────┘

Implementation

public class CircuitBreaker
{
    private readonly object _lock = new();
    
    private int _failureCount;
    private int _successCount;
    private DateTime _lastFailureTime;
    private CircuitState _state = CircuitState.Closed;
    
    private readonly int _failureThreshold;
    private readonly int _successThreshold;
    private readonly TimeSpan _timeout;
    private readonly TimeSpan _samplingWindow;

    public CircuitBreaker(
        int failureThreshold = 5,
        int successThreshold = 2,
        TimeSpan timeout = null,
        TimeSpan samplingWindow = null)
    {
        _failureThreshold = failureThreshold;
        _successThreshold = successThreshold;
        _timeout = timeout ?? TimeSpan.FromSeconds(30);
        _samplingWindow = samplingWindow ?? TimeSpan.FromSeconds(60);
    }

    public bool CanExecute => _state != CircuitState.Open;

    public async Task<T> ExecuteAsync<T>(Func<Task<T>> operation)
    {
        if (_state == CircuitState.Open)
        {
            // Check if we should try half-open
            if (DateTime.UtcNow - _lastFailureTime > _timeout)
            {
                _state = CircuitState.HalfOpen;
            }
            else
            {
                throw new CircuitOpenException(
                    $"Circuit is open. Failures: {_failureCount}, Last failure: {_lastFailureTime}");
            }
        }

        try
        {
            var result = await operation();
            RecordSuccess();
            return result;
        }
        catch (Exception ex)
        {
            RecordFailure();
            throw;
        }
    }

    public async Task ExecuteAsync(Func<Task> operation)
    {
        await ExecuteAsync(async () =>
        {
            await operation();
            return true;
        });
    }

    private void RecordSuccess()
    {
        lock (_lock)
        {
            _failureCount = 0;
            _successCount++;

            if (_state == CircuitState.HalfOpen && _successCount >= _successThreshold)
            {
                _logger?.LogInformation("Circuit breaker closing after {SuccessCount} successes", 
                    _successCount);
                _state = CircuitState.Closed;
                _successCount = 0;
            }
        }
    }

    private void RecordFailure()
    {
        lock (_lock)
        {
            _failureCount++;
            _lastFailureTime = DateTime.UtcNow;
            _successCount = 0;

            if (_state == CircuitState.HalfOpen)
            {
                _logger?.LogWarning("Circuit breaker reopening after half-open failure");
                _state = CircuitState.Open;
            }
            else if (_failureCount >= _failureThreshold)
            {
                _logger?.LogWarning(
                    "Circuit breaker opening after {FailureCount} failures",
                    _failureCount);
                _state = CircuitState.Open;
            }
        }
    }
}

// Usage in Azure Function
public class OrderProcessingWithCircuitBreaker
{
    private readonly CircuitBreaker _paymentCircuitBreaker;
    private readonly IPaymentService _paymentService;
    private readonly ICacheService _cacheService;
    private readonly ILogger _logger;

    [Function("ProcessOrderWithCircuitBreaker")]
    public async Task Run([ServiceBusTrigger("orders")] ServiceBusReceivedMessage message)
    {
        var order = message.Body.ToObject<Order>();

        try
        {
            // Try payment with circuit breaker
            await _paymentCircuitBreaker.ExecuteAsync(async () =>
            {
                await _paymentService.ProcessPaymentAsync(order.PaymentInfo);
            });
        }
        catch (CircuitOpenException)
        {
            // Circuit is open - use fallback
            _logger.LogWarning(
                "Payment service circuit open. Using fallback payment handling");

            // Queue for manual payment processing
            await QueueManualPaymentAsync(order);
        }

        // Continue with order processing
        await _orderService.ConfirmOrderAsync(order);
    }
}

Step 5: Custom Retry with Visibility Timeout

public class CustomRetryFunction
{
    private readonly ServiceBusClient _client;
    private readonly ILogger<CustomRetryFunction> _logger;

    public CustomRetryFunction(ServiceBusClient client, ILogger<CustomRetryFunction> logger)
    {
        _client = client;
        _logger = logger;
    }

    [Function("ProcessWithCustomRetry")]
    public async Task Run(
        [ServiceBusTrigger("orders", AutoComplete = false)] ServiceBusReceivedMessage message,
        ServiceBusMessageActions actions)
    {
        var retryCount = (int)(message.ApplicationProperties.GetValueOrDefault("RetryCount", 0));
        var maxRetries = 3;

        try
        {
            await ProcessOrderAsync(message.Body.ToObject<Order>());
            await actions.CompleteMessageAsync(message);
        }
        catch (Exception ex) when (IsTransient(ex))
        {
            if (retryCount >= maxRetries)
            {
                _logger.LogError("Max retries exceeded, dead-lettering");
                await actions.DeadLetterMessageAsync(message, new Dictionary<string, object>
                {
                    ["LastError"] = ex.Message,
                    ["RetryCount"] = retryCount
                });
                return;
            }

            _logger.LogWarning("Transient error, retry {Retry}/{Max}", retryCount + 1, maxRetries);

            // Calculate delay and schedule re-delivery
            var delay = CalculateDelay(retryCount);
            
            // Create new message with incremented retry count
            var retryMessage = new ServiceBusMessage(message)
            {
                ScheduledEnqueueTime = DateTimeOffset.UtcNow.Add(delay),
                ApplicationProperties =
                {
                    ["RetryCount"] = retryCount + 1,
                    ["OriginalMessageId"] = message.MessageId,
                    ["LastError"] = ex.Message
                }
            };

            // Send to same queue - will be picked up after delay
            var sender = _client.CreateSender("orders");
            await sender.SendMessageAsync(retryMessage);

            // Complete original message
            await actions.CompleteMessageAsync(message);
        }
    }

    private bool IsTransient(Exception ex)
    {
        return ex is TimeoutException ||
               ex is HttpRequestException ||
               ex is IOException ||
               (ex as ServiceBusException)?.IsTransient == true;
    }

    private TimeSpan CalculateDelay(int retryCount)
    {
        // 10s, 30s, 60s...
        return TimeSpan.FromSeconds(10 * retryCount * retryCount);
    }
}

Step 6: Error Handling Best Practices

Complete Error Handling Pattern

public class ResilientFunction
{
    private readonly ILogger<ResilientFunction> _logger;
    private readonly IMetricsCollector _metrics;

    public ResilientFunction(
        ILogger<ResilientFunction> logger,
        IMetricsCollector metrics)
    {
        _logger = logger;
        _metrics = metrics;
    }

    [Function("ResilientOrderProcessing")]
    public async Task Run(
        [ServiceBusTrigger("orders", AutoComplete = false)] 
        ServiceBusReceivedMessage message,
        ServiceBusMessageActions actions)
    {
        var correlationId = message.CorrelationId;
        var deliveryCount = message.DeliveryCount;

        using var scope = _logger.BeginScope("Processing order {CorrelationId}, Attempt {Attempt}",
            correlationId, deliveryCount);

        try
        {
            // Track start time
            var startTime = DateTime.UtcNow;

            // Process
            await ProcessOrderAsync(message.Body.ToObject<Order>());

            // Track success metrics
            var duration = DateTime.UtcNow - startTime;
            _metrics.RecordSuccess("orders", duration);

            await actions.CompleteMessageAsync(message);
        }
        catch (ValidationException ex)
        {
            // Permanent failure - don't retry
            _logger.LogError(ex, "Validation failed for order {CorrelationId}. Not retrying.",
                correlationId);

            _metrics.RecordFailure("orders", "validation");

            await actions.DeadLetterMessageAsync(message, new Dictionary<string, object>
            {
                ["ErrorType"] = "Validation",
                ["ErrorMessage"] = ex.Message
            });
        }
        catch (TransientException ex) when (deliveryCount < 5)
        {
            // Transient failure - retry with backoff
            _logger.LogWarning(ex, "Transient error processing order {CorrelationId}, attempt {Attempt}",
                correlationId, deliveryCount);

            _metrics.RecordFailure("orders", "transient");

            var delay = TimeSpan.FromSeconds(Math.Pow(2, deliveryCount));
            await actions.AbandonMessageAsync(message, new Dictionary<string, object>
            {
                ["AbandonedAt"] = DateTime.UtcNow,
                ["RetryDelay"] = delay
            });
        }
        catch (Exception ex) when (deliveryCount >= 5)
        {
            // Max retries exceeded
            _logger.LogError(ex, "Max retries exceeded for order {CorrelationId}",
                correlationId);

            _metrics.RecordFailure("orders", "exhausted");

            await actions.DeadLetterMessageAsync(message, new Dictionary<string, object>
            {
                ["ErrorType"] = ex.GetType().Name,
                ["ErrorMessage"] = ex.Message,
                ["FinalAttempt"] = deliveryCount
            });
        }
        catch (Exception ex)
        {
            // Unexpected error
            _logger.LogError(ex, "Unexpected error processing order {CorrelationId}",
                correlationId);

            _metrics.RecordFailure("orders", "unexpected");

            throw; // Let host handle
        }
    }

    private async Task ProcessOrderAsync(Order order)
    {
        // Your business logic
        await Task.Delay(100);
    }
}

Monitoring and Alerts

public class ErrorMonitoringService
{
    private readonly ILogger<ErrorMonitoringService> _logger;
    private readonly IMetricsClient _metricsClient;

    public async Task CheckAndAlertAsync()
    {
        // Get metrics from your monitoring system
        var recentErrors = await _metricsClient.GetErrorsAsync(
            TimeSpan.FromMinutes(15));

        var errorRate = recentErrors.Total / recentErrors.TotalRequests;

        if (errorRate > 0.1) // >10% error rate
        {
            await SendAlertAsync(
                $"High Error Rate: {errorRate:P1}",
                $"Errors in last 15 minutes: {recentErrors.Total}. " +
                $"Breakdown: {string.Join(", ", recentErrors.ByType)}");
        }

        // Check DLQ depth
        var dlqDepth = await GetQueueDepthAsync("orders/$dead-letter-queue");
        
        if (dlqDepth > 100)
        {
            await SendAlertAsync(
                $"DLQ Backlog: {dlqDepth} messages",
                "Dead letter queue has accumulated messages. Investigation required.");
        }
    }

    private async Task SendAlertAsync(string title, string message)
    {
        // Send to your alerting system
        _logger.LogCritical("{Title}: {Message}", title, message);
    }
}

Best Practices Summary

PracticeWhyImplementation
Distinguish error typesDifferent errors need different handlingTransient vs Permanent
Use exponential backoffPrevent thundering herd1s, 2s, 4s, 8s pattern
Set max delivery countPrevent infinite retries3-5 for most cases
Implement circuit breakersPrevent cascade failuresOpen → Half-open → Closed
Log extensivelyDebugging is criticalInclude correlation IDs
Monitor error ratesEarly warning systemSet up alerts

Testing Error Handling

[Fact]
public async Task Retry_ExponentialBackoff_Works()
{
    // Arrange
    var callCount = 0;
    var delays = new List<TimeSpan>();

    async Task OperationWithDelay()
    {
        callCount++;
        delays.Add(_backoffService.CurrentDelay); // Track delays
        throw new TransientException("Test error");
    }

    // Act & Assert
    await Assert.ThrowsAsync<TransientException>(() =>
        _backoffService.ExecuteWithRetryAsync(
            OperationWithDelay,
            ex => true));

    Assert.Equal(3, callCount);
    Assert.True(delays[1] > delays[0]); // Each delay increases
}

Conclusion

Robust error handling is essential for serverless applications:

Key takeaways:

  1. Distinguish transient from permanent errors
  2. Implement exponential backoff, not fixed delays
  3. Set reasonable max delivery counts (3-5)
  4. Use circuit breakers for external dependencies
  5. Monitor error rates and DLQ depth
  6. Test your error handling thoroughly

Azure Integration Hub - Functions