Retry Policies & Delivery Count in Azure Service Bus

Overview

In distributed messaging systems, transient failures are inevitable. Network blips, temporary service outages, database connection timeouts, and resource throttling can all cause message processing to fail. A robust retry strategy ensures your system recovers gracefully from these temporary issues without losing messages or requiring manual intervention.

Azure Service Bus provides built-in retry mechanisms at both the client SDK level and the broker level (via delivery count). This guide covers how to configure and combine these mechanisms effectively.


Why Retry Matters in Messaging

Without retry logic, a single transient failure can cause:

  • Message loss — If you complete a message before processing succeeds
  • System stalls — If failed messages block the queue
  • Manual intervention — Operators must investigate and replay messages
  • Data inconsistency — Partially processed operations leave your system in a broken state

A well-designed retry strategy handles transient failures automatically while routing permanently failed messages to a dead-letter queue for investigation.


Built-in Service Bus Retry: MaxDeliveryCount

Service Bus tracks how many times a message has been delivered to a consumer. Each time a message is received and then abandoned (or the lock expires), the delivery count increments. When it exceeds MaxDeliveryCount, the message is automatically moved to the dead-letter queue.

How It Works

1. Message received (DeliveryCount = 1)
2. Processing fails → Abandon message
3. Message returns to queue (DeliveryCount = 2)
4. Processing fails again → Abandon message
5. Message returns to queue (DeliveryCount = 3)
6. DeliveryCount exceeds MaxDeliveryCount (default: 10)
7. Message automatically moved to Dead-Letter Queue

Configuring MaxDeliveryCount

Via C# SDK

using Azure.Messaging.ServiceBus.Administration;

var adminClient = new ServiceBusAdministrationClient(connectionString);

// Create a new queue with custom delivery count
var queueOptions = new CreateQueueOptions("order-processing")
{
    MaxDeliveryCount = 5,
    LockDuration = TimeSpan.FromMinutes(2),
    DeadLetteringOnMessageExpiration = true
};

await adminClient.CreateQueueAsync(queueOptions);

// Update an existing queue
var queueProperties = await adminClient.GetQueueAsync("order-processing");
queueProperties.Value.MaxDeliveryCount = 3;
await adminClient.UpdateQueueAsync(queueProperties.Value);

Via Azure CLI

# Create queue with max delivery count of 5
az servicebus queue create \
  --resource-group myRG \
  --namespace-name myNamespace \
  --name order-processing \
  --max-delivery-count 5 \
  --lock-duration PT2M

# Update existing queue
az servicebus queue update \
  --resource-group myRG \
  --namespace-name myNamespace \
  --name order-processing \
  --max-delivery-count 3

Choosing the Right MaxDeliveryCount

ScenarioRecommended ValueReasoning
Idempotent operations3–5Quick failure detection, safe to retry
External API calls5–10Allow time for transient outages to resolve
Critical financial transactions1–2Fail fast, investigate manually
Batch processing10+Allow many retries before dead-lettering

Client-Level Retry (SDK RetryOptions)

The Service Bus SDK includes built-in retry logic for transient communication errors (network timeouts, throttling, server busy). This is separate from the broker-level delivery count.

using Azure.Messaging.ServiceBus;

var retryOptions = new ServiceBusRetryOptions
{
    Mode = ServiceBusRetryMode.Exponential,
    MaxRetries = 5,
    Delay = TimeSpan.FromSeconds(1),        // Initial delay
    MaxDelay = TimeSpan.FromSeconds(60),    // Cap on delay
    TryTimeout = TimeSpan.FromSeconds(30)   // Timeout per attempt
};

var clientOptions = new ServiceBusClientOptions
{
    RetryOptions = retryOptions
};

var client = new ServiceBusClient(connectionString, clientOptions);

Retry Modes

ModeBehaviorBest For
ExponentialDelay doubles each retry: 1s → 2s → 4s → 8s...Most scenarios — prevents thundering herd
FixedConstant delay between retriesPredictable timing requirements

Important: SDK retry handles communication failures (connecting to Service Bus). It does NOT retry your message processing logic. For that, you need application-level retry.


Application-Level Retry with Polly

For retrying your own processing logic (database calls, HTTP requests, etc.), use the Polly resilience library:

using Polly;
using Polly.Retry;
using Azure.Messaging.ServiceBus;

public class OrderProcessor
{
    private readonly ServiceBusProcessor _processor;
    private readonly AsyncRetryPolicy _retryPolicy;
    private readonly ILogger<OrderProcessor> _logger;

    public OrderProcessor(ServiceBusClient client, ILogger<OrderProcessor> logger)
    {
        _logger = logger;

        // Define retry policy: 3 retries with exponential backoff
        _retryPolicy = Policy
            .Handle<HttpRequestException>()
            .Or<TimeoutException>()
            .Or<SqlException>(ex => IsTransient(ex))
            .WaitAndRetryAsync(
                retryCount: 3,
                sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
                onRetry: (exception, delay, attempt, context) =>
                {
                    _logger.LogWarning(
                        "Retry {Attempt} after {Delay}s due to: {Error}",
                        attempt, delay.TotalSeconds, exception.Message);
                });

        _processor = client.CreateProcessor("order-processing");
        _processor.ProcessMessageAsync += HandleMessageAsync;
        _processor.ProcessErrorAsync += HandleErrorAsync;
    }

    private async Task HandleMessageAsync(ProcessMessageEventArgs args)
    {
        var order = args.Message.Body.ToObjectFromJson<Order>();

        try
        {
            // Retry transient failures within this delivery attempt
            await _retryPolicy.ExecuteAsync(async () =>
            {
                await ProcessOrderAsync(order);
            });

            // Success — complete the message
            await args.CompleteMessageAsync(args.Message);
        }
        catch (Exception ex) when (IsTransient(ex))
        {
            // All retries exhausted but still transient — abandon for redelivery
            _logger.LogWarning("Transient failure after retries, abandoning for redelivery: {Error}", ex.Message);
            await args.AbandonMessageAsync(args.Message);
        }
        catch (Exception ex)
        {
            // Permanent failure — dead-letter immediately
            _logger.LogError(ex, "Permanent failure processing order {OrderId}", order.Id);
            await args.DeadLetterMessageAsync(args.Message,
                deadLetterReason: "PermanentProcessingFailure",
                deadLetterErrorDescription: ex.Message);
        }
    }

    private static bool IsTransient(Exception ex) =>
        ex is HttpRequestException or TimeoutException or SqlException { Number: 1205 or -2 or 40613 };
}

Distinguishing Transient vs Permanent Failures

This is the most critical decision in your retry logic. Retrying a permanent failure wastes resources and delays dead-lettering.

Failure TypeExamplesAction
TransientNetwork timeout, HTTP 429/503, SQL deadlock, connection resetRetry with backoff
PermanentInvalid message format, business rule violation, HTTP 400/404, missing required dataDead-letter immediately
UnknownUnexpected exceptionsAbandon (let delivery count handle it)

Implementation Pattern

private async Task HandleMessageAsync(ProcessMessageEventArgs args)
{
    try
    {
        await ProcessAsync(args.Message);
        await args.CompleteMessageAsync(args.Message);
    }
    catch (ValidationException ex)
    {
        // Permanent: bad data, will never succeed
        await args.DeadLetterMessageAsync(args.Message,
            deadLetterReason: "ValidationFailed",
            deadLetterErrorDescription: ex.Message);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
    {
        // Permanent: resource doesn't exist
        await args.DeadLetterMessageAsync(args.Message,
            deadLetterReason: "ResourceNotFound",
            deadLetterErrorDescription: ex.Message);
    }
    catch (Exception ex) when (IsTransient(ex))
    {
        // Transient: abandon for redelivery
        await args.AbandonMessageAsync(args.Message,
            new Dictionary<string, object> { ["LastError"] = ex.Message });
    }
    catch (Exception ex)
    {
        // Unknown: log and abandon, let MaxDeliveryCount protect us
        _logger.LogError(ex, "Unexpected error processing message {Id}", args.Message.MessageId);
        await args.AbandonMessageAsync(args.Message);
    }
}

Retry Configuration for Azure Functions

When using Service Bus triggers in Azure Functions, configure retry behavior in host.json:

{
  "version": "2.0",
  "extensions": {
    "serviceBus": {
      "messageHandlerOptions": {
        "autoComplete": false,
        "maxConcurrentCalls": 16,
        "maxAutoLockRenewalDuration": "00:05:00"
      },
      "batchOptions": {
        "maxMessageCount": 10,
        "operationTimeout": "00:01:00"
      }
    }
  },
  "retry": {
    "strategy": "exponentialBackoff",
    "maxRetryCount": 3,
    "minimumInterval": "00:00:02",
    "maximumInterval": "00:00:30"
  }
}

Note: The retry section in host.json controls function-level retry (re-invoking the function). This is in addition to Service Bus delivery count. Be careful not to multiply retries unintentionally.

Function-Level Retry Attribute

[Function("ProcessOrder")]
[ExponentialBackoffRetry(3, "00:00:02", "00:00:30")]
public async Task Run(
    [ServiceBusTrigger("order-processing", Connection = "ServiceBusConnection")]
    ServiceBusReceivedMessage message,
    ServiceBusMessageActions messageActions)
{
    var order = message.Body.ToObjectFromJson<Order>();
    await ProcessOrderAsync(order);
    await messageActions.CompleteMessageAsync(message);
}

Dead-Letter Queue Processing

After all retries are exhausted, messages land in the dead-letter queue. You need a strategy to handle them:

public class DeadLetterProcessor
{
    [Function("ProcessDeadLetters")]
    public async Task Run(
        [ServiceBusTrigger("order-processing/$deadletterqueue", Connection = "ServiceBusConnection")]
        ServiceBusReceivedMessage message,
        ServiceBusMessageActions messageActions)
    {
        var deadLetterReason = message.DeadLetterReason;
        var errorDescription = message.DeadLetterErrorDescription;
        var deliveryCount = message.DeliveryCount;

        _logger.LogError(
            "Dead-lettered message {Id}: Reason={Reason}, Error={Error}, Deliveries={Count}",
            message.MessageId, deadLetterReason, errorDescription, deliveryCount);

        // Store for investigation
        await _alertService.NotifyDeadLetterAsync(message);
        await messageActions.CompleteMessageAsync(message);
    }
}

Real-World Retry Patterns

Pattern 1: Circuit Breaker + Retry

When a downstream service is completely down, retrying every message wastes resources. Combine retry with a circuit breaker:

var circuitBreakerPolicy = Policy
    .Handle<HttpRequestException>()
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromMinutes(1));

var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)));

// Wrap retry inside circuit breaker
var combinedPolicy = Policy.WrapAsync(circuitBreakerPolicy, retryPolicy);

Pattern 2: Scheduled Retry with Deferred Messages

For failures that need longer delays between retries (e.g., waiting for an external system to recover):

catch (Exception ex) when (IsTransient(ex) && message.DeliveryCount < 3)
{
    // Schedule retry with increasing delay
    var delay = TimeSpan.FromMinutes(Math.Pow(5, message.DeliveryCount));
    var retryMessage = new ServiceBusMessage(message.Body)
    {
        ScheduledEnqueueTime = DateTimeOffset.UtcNow.Add(delay),
        ApplicationProperties = { ["OriginalMessageId"] = message.MessageId }
    };

    await sender.SendMessageAsync(retryMessage);
    await args.CompleteMessageAsync(message);
}

Best Practices

  1. Always set autoComplete: false — Take explicit control over when messages are completed, abandoned, or dead-lettered.

  2. Classify failures early — Determine if a failure is transient or permanent as close to the error source as possible. Don't retry permanent failures.

  3. Add jitter to backoff — Prevent thundering herd when many consumers retry simultaneously:

var jitter = TimeSpan.FromMilliseconds(Random.Shared.Next(0, 1000));
var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt)) + jitter;
  1. Log delivery count — Always include message.DeliveryCount in your logs to track retry progression.

  2. Set reasonable lock duration — Ensure your lock duration exceeds your total retry time within a single delivery. If the lock expires, the message is redelivered (incrementing delivery count unexpectedly).

  3. Don't multiply retries — If you have 3 Polly retries × 5 MaxDeliveryCount × 3 function retries = 45 total attempts. Be intentional about each layer.

  4. Use dead-letter reason and description — Always provide meaningful context when dead-lettering so operators can diagnose issues without reading code.

  5. Monitor dead-letter queue depth — Set up alerts when the DLQ grows, indicating a systemic issue.


Common Pitfalls

PitfallImpactSolution
Retrying permanent failuresDelays dead-lettering, wastes resourcesClassify errors and dead-letter immediately for permanent failures
Lock expiration during retryUnexpected redelivery, duplicate processingIncrease lock duration or reduce retry count
No dead-letter monitoringFailed messages accumulate silentlyAlert on DLQ message count > 0
Infinite retry loopsMessage never reaches DLQAlways have MaxDeliveryCount as a safety net
Completing before processingMessage lost on failureSet autoComplete: false, complete explicitly after success

Next Steps


Azure Integration Hub - Intermediate Level