Retry Policies & Delivery Count in Azure Service Bus

Overview

In distributed messaging systems, transient failures are inevitable. Network blips, temporary service outages, database connection timeouts, and resource throttling can all cause message processing to fail. A robust retry strategy ensures your system recovers gracefully from these temporary issues without losing messages or requiring manual intervention.

Azure Service Bus provides built-in retry mechanisms at both the client SDK level and the broker level (via delivery count). This guide covers how to configure and combine these mechanisms effectively.

Why Retry Matters in Messaging

Without retry logic, a single transient failure can cause:

Message loss — If you complete a message before processing succeeds
System stalls — If failed messages block the queue
Manual intervention — Operators must investigate and replay messages
Data inconsistency — Partially processed operations leave your system in a broken state

A well-designed retry strategy handles transient failures automatically while routing permanently failed messages to a dead-letter queue for investigation.

Built-in Service Bus Retry: MaxDeliveryCount

Service Bus tracks how many times a message has been delivered to a consumer. Each time a message is received and then abandoned (or the lock expires), the delivery count increments. When it exceeds MaxDeliveryCount, the message is automatically moved to the dead-letter queue.

How It Works

1. Message received (DeliveryCount = 1)
2. Processing fails → Abandon message
3. Message returns to queue (DeliveryCount = 2)
4. Processing fails again → Abandon message
5. Message returns to queue (DeliveryCount = 3)
6. DeliveryCount exceeds MaxDeliveryCount (default: 10)
7. Message automatically moved to Dead-Letter Queue

Configuring MaxDeliveryCount

Via C# SDK

using Azure.Messaging.ServiceBus.Administration;

var adminClient = new ServiceBusAdministrationClient(connectionString);

// Create a new queue with custom delivery count
var queueOptions = new CreateQueueOptions("order-processing")
{
    MaxDeliveryCount = 5,
    LockDuration = TimeSpan.FromMinutes(2),
    DeadLetteringOnMessageExpiration = true
};

await adminClient.CreateQueueAsync(queueOptions);

// Update an existing queue
var queueProperties = await adminClient.GetQueueAsync("order-processing");
queueProperties.Value.MaxDeliveryCount = 3;
await adminClient.UpdateQueueAsync(queueProperties.Value);

Via Azure CLI

# Create queue with max delivery count of 5
az servicebus queue create \
  --resource-group myRG \
  --namespace-name myNamespace \
  --name order-processing \
  --max-delivery-count 5 \
  --lock-duration PT2M

# Update existing queue
az servicebus queue update \
  --resource-group myRG \
  --namespace-name myNamespace \
  --name order-processing \
  --max-delivery-count 3

Choosing the Right MaxDeliveryCount

Scenario	Recommended Value	Reasoning
Idempotent operations	3–5	Quick failure detection, safe to retry
External API calls	5–10	Allow time for transient outages to resolve
Critical financial transactions	1–2	Fail fast, investigate manually
Batch processing	10+	Allow many retries before dead-lettering

Client-Level Retry (SDK RetryOptions)

The Service Bus SDK includes built-in retry logic for transient communication errors (network timeouts, throttling, server busy). This is separate from the broker-level delivery count.

using Azure.Messaging.ServiceBus;

var retryOptions = new ServiceBusRetryOptions
{
    Mode = ServiceBusRetryMode.Exponential,
    MaxRetries = 5,
    Delay = TimeSpan.FromSeconds(1),        // Initial delay
    MaxDelay = TimeSpan.FromSeconds(60),    // Cap on delay
    TryTimeout = TimeSpan.FromSeconds(30)   // Timeout per attempt
};

var clientOptions = new ServiceBusClientOptions
{
    RetryOptions = retryOptions
};

var client = new ServiceBusClient(connectionString, clientOptions);

Retry Modes

Mode	Behavior	Best For
`Exponential`	Delay doubles each retry: 1s → 2s → 4s → 8s...	Most scenarios — prevents thundering herd
`Fixed`	Constant delay between retries	Predictable timing requirements

Important: SDK retry handles communication failures (connecting to Service Bus). It does NOT retry your message processing logic. For that, you need application-level retry.

Application-Level Retry with Polly

For retrying your own processing logic (database calls, HTTP requests, etc.), use the Polly resilience library:

using Polly;
using Polly.Retry;
using Azure.Messaging.ServiceBus;

public class OrderProcessor
{
    private readonly ServiceBusProcessor _processor;
    private readonly AsyncRetryPolicy _retryPolicy;
    private readonly ILogger<OrderProcessor> _logger;

    public OrderProcessor(ServiceBusClient client, ILogger<OrderProcessor> logger)
    {
        _logger = logger;

        // Define retry policy: 3 retries with exponential backoff
        _retryPolicy = Policy
            .Handle<HttpRequestException>()
            .Or<TimeoutException>()
            .Or<SqlException>(ex => IsTransient(ex))
            .WaitAndRetryAsync(
                retryCount: 3,
                sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
                onRetry: (exception, delay, attempt, context) =>
                {
                    _logger.LogWarning(
                        "Retry {Attempt} after {Delay}s due to: {Error}",
                        attempt, delay.TotalSeconds, exception.Message);
                });

        _processor = client.CreateProcessor("order-processing");
        _processor.ProcessMessageAsync += HandleMessageAsync;
        _processor.ProcessErrorAsync += HandleErrorAsync;
    }

    private async Task HandleMessageAsync(ProcessMessageEventArgs args)
    {
        var order = args.Message.Body.ToObjectFromJson<Order>();

        try
        {
            // Retry transient failures within this delivery attempt
            await _retryPolicy.ExecuteAsync(async () =>
            {
                await ProcessOrderAsync(order);
            });

            // Success — complete the message
            await args.CompleteMessageAsync(args.Message);
        }
        catch (Exception ex) when (IsTransient(ex))
        {
            // All retries exhausted but still transient — abandon for redelivery
            _logger.LogWarning("Transient failure after retries, abandoning for redelivery: {Error}", ex.Message);
            await args.AbandonMessageAsync(args.Message);
        }
        catch (Exception ex)
        {
            // Permanent failure — dead-letter immediately
            _logger.LogError(ex, "Permanent failure processing order {OrderId}", order.Id);
            await args.DeadLetterMessageAsync(args.Message,
                deadLetterReason: "PermanentProcessingFailure",
                deadLetterErrorDescription: ex.Message);
        }
    }

    private static bool IsTransient(Exception ex) =>
        ex is HttpRequestException or TimeoutException or SqlException { Number: 1205 or -2 or 40613 };
}

Distinguishing Transient vs Permanent Failures

This is the most critical decision in your retry logic. Retrying a permanent failure wastes resources and delays dead-lettering.

Failure Type	Examples	Action
Transient	Network timeout, HTTP 429/503, SQL deadlock, connection reset	Retry with backoff
Permanent	Invalid message format, business rule violation, HTTP 400/404, missing required data	Dead-letter immediately
Unknown	Unexpected exceptions	Abandon (let delivery count handle it)

Implementation Pattern

private async Task HandleMessageAsync(ProcessMessageEventArgs args)
{
    try
    {
        await ProcessAsync(args.Message);
        await args.CompleteMessageAsync(args.Message);
    }
    catch (ValidationException ex)
    {
        // Permanent: bad data, will never succeed
        await args.DeadLetterMessageAsync(args.Message,
            deadLetterReason: "ValidationFailed",
            deadLetterErrorDescription: ex.Message);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
    {
        // Permanent: resource doesn't exist
        await args.DeadLetterMessageAsync(args.Message,
            deadLetterReason: "ResourceNotFound",
            deadLetterErrorDescription: ex.Message);
    }
    catch (Exception ex) when (IsTransient(ex))
    {
        // Transient: abandon for redelivery
        await args.AbandonMessageAsync(args.Message,
            new Dictionary<string, object> { ["LastError"] = ex.Message });
    }
    catch (Exception ex)
    {
        // Unknown: log and abandon, let MaxDeliveryCount protect us
        _logger.LogError(ex, "Unexpected error processing message {Id}", args.Message.MessageId);
        await args.AbandonMessageAsync(args.Message);
    }
}

Retry Configuration for Azure Functions

When using Service Bus triggers in Azure Functions, configure retry behavior in host.json:

{
  "version": "2.0",
  "extensions": {
    "serviceBus": {
      "messageHandlerOptions": {
        "autoComplete": false,
        "maxConcurrentCalls": 16,
        "maxAutoLockRenewalDuration": "00:05:00"
      },
      "batchOptions": {
        "maxMessageCount": 10,
        "operationTimeout": "00:01:00"
      }
    }
  },
  "retry": {
    "strategy": "exponentialBackoff",
    "maxRetryCount": 3,
    "minimumInterval": "00:00:02",
    "maximumInterval": "00:00:30"
  }
}

Note: The retry section in host.json controls function-level retry (re-invoking the function). This is in addition to Service Bus delivery count. Be careful not to multiply retries unintentionally.

Function-Level Retry Attribute

[Function("ProcessOrder")]
[ExponentialBackoffRetry(3, "00:00:02", "00:00:30")]
public async Task Run(
    [ServiceBusTrigger("order-processing", Connection = "ServiceBusConnection")]
    ServiceBusReceivedMessage message,
    ServiceBusMessageActions messageActions)
{
    var order = message.Body.ToObjectFromJson<Order>();
    await ProcessOrderAsync(order);
    await messageActions.CompleteMessageAsync(message);
}

Dead-Letter Queue Processing

After all retries are exhausted, messages land in the dead-letter queue. You need a strategy to handle them:

public class DeadLetterProcessor
{
    [Function("ProcessDeadLetters")]
    public async Task Run(
        [ServiceBusTrigger("order-processing/$deadletterqueue", Connection = "ServiceBusConnection")]
        ServiceBusReceivedMessage message,
        ServiceBusMessageActions messageActions)
    {
        var deadLetterReason = message.DeadLetterReason;
        var errorDescription = message.DeadLetterErrorDescription;
        var deliveryCount = message.DeliveryCount;

        _logger.LogError(
            "Dead-lettered message {Id}: Reason={Reason}, Error={Error}, Deliveries={Count}",
            message.MessageId, deadLetterReason, errorDescription, deliveryCount);

        // Store for investigation
        await _alertService.NotifyDeadLetterAsync(message);
        await messageActions.CompleteMessageAsync(message);
    }
}

Real-World Retry Patterns

Pattern 1: Circuit Breaker + Retry

When a downstream service is completely down, retrying every message wastes resources. Combine retry with a circuit breaker:

var circuitBreakerPolicy = Policy
    .Handle<HttpRequestException>()
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromMinutes(1));

var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)));

// Wrap retry inside circuit breaker
var combinedPolicy = Policy.WrapAsync(circuitBreakerPolicy, retryPolicy);

Pattern 2: Scheduled Retry with Deferred Messages

For failures that need longer delays between retries (e.g., waiting for an external system to recover):

catch (Exception ex) when (IsTransient(ex) && message.DeliveryCount < 3)
{
    // Schedule retry with increasing delay
    var delay = TimeSpan.FromMinutes(Math.Pow(5, message.DeliveryCount));
    var retryMessage = new ServiceBusMessage(message.Body)
    {
        ScheduledEnqueueTime = DateTimeOffset.UtcNow.Add(delay),
        ApplicationProperties = { ["OriginalMessageId"] = message.MessageId }
    };

    await sender.SendMessageAsync(retryMessage);
    await args.CompleteMessageAsync(message);
}

Best Practices

Always set autoComplete: false — Take explicit control over when messages are completed, abandoned, or dead-lettered.
Classify failures early — Determine if a failure is transient or permanent as close to the error source as possible. Don't retry permanent failures.
Add jitter to backoff — Prevent thundering herd when many consumers retry simultaneously:

var jitter = TimeSpan.FromMilliseconds(Random.Shared.Next(0, 1000));
var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt)) + jitter;

Log delivery count — Always include message.DeliveryCount in your logs to track retry progression.
Set reasonable lock duration — Ensure your lock duration exceeds your total retry time within a single delivery. If the lock expires, the message is redelivered (incrementing delivery count unexpectedly).
Don't multiply retries — If you have 3 Polly retries × 5 MaxDeliveryCount × 3 function retries = 45 total attempts. Be intentional about each layer.
Use dead-letter reason and description — Always provide meaningful context when dead-lettering so operators can diagnose issues without reading code.
Monitor dead-letter queue depth — Set up alerts when the DLQ grows, indicating a systemic issue.

Common Pitfalls

Pitfall	Impact	Solution
Retrying permanent failures	Delays dead-lettering, wastes resources	Classify errors and dead-letter immediately for permanent failures
Lock expiration during retry	Unexpected redelivery, duplicate processing	Increase lock duration or reduce retry count
No dead-letter monitoring	Failed messages accumulate silently	Alert on DLQ message count > 0
Infinite retry loops	Message never reaches DLQ	Always have MaxDeliveryCount as a safety net
Completing before processing	Message lost on failure	Set `autoComplete: false`, complete explicitly after success

Next Steps

Dead-Letter Queue Processing — Strategies for handling failed messages
Service Bus Sessions — Ordered message processing with retry

Azure Integration Hub - Intermediate Level