Retry Policies & Delivery Count in Azure Service Bus
Overview
In distributed messaging systems, transient failures are inevitable. Network blips, temporary service outages, database connection timeouts, and resource throttling can all cause message processing to fail. A robust retry strategy ensures your system recovers gracefully from these temporary issues without losing messages or requiring manual intervention.
Azure Service Bus provides built-in retry mechanisms at both the client SDK level and the broker level (via delivery count). This guide covers how to configure and combine these mechanisms effectively.
Why Retry Matters in Messaging
Without retry logic, a single transient failure can cause:
- Message loss — If you complete a message before processing succeeds
- System stalls — If failed messages block the queue
- Manual intervention — Operators must investigate and replay messages
- Data inconsistency — Partially processed operations leave your system in a broken state
A well-designed retry strategy handles transient failures automatically while routing permanently failed messages to a dead-letter queue for investigation.
Built-in Service Bus Retry: MaxDeliveryCount
Service Bus tracks how many times a message has been delivered to a consumer. Each time a message is received and then abandoned (or the lock expires), the delivery count increments. When it exceeds MaxDeliveryCount, the message is automatically moved to the dead-letter queue.
How It Works
1. Message received (DeliveryCount = 1)
2. Processing fails → Abandon message
3. Message returns to queue (DeliveryCount = 2)
4. Processing fails again → Abandon message
5. Message returns to queue (DeliveryCount = 3)
6. DeliveryCount exceeds MaxDeliveryCount (default: 10)
7. Message automatically moved to Dead-Letter Queue
Configuring MaxDeliveryCount
Via C# SDK
using Azure.Messaging.ServiceBus.Administration;
var adminClient = new ServiceBusAdministrationClient(connectionString);
// Create a new queue with custom delivery count
var queueOptions = new CreateQueueOptions("order-processing")
{
MaxDeliveryCount = 5,
LockDuration = TimeSpan.FromMinutes(2),
DeadLetteringOnMessageExpiration = true
};
await adminClient.CreateQueueAsync(queueOptions);
// Update an existing queue
var queueProperties = await adminClient.GetQueueAsync("order-processing");
queueProperties.Value.MaxDeliveryCount = 3;
await adminClient.UpdateQueueAsync(queueProperties.Value);
Via Azure CLI
# Create queue with max delivery count of 5
az servicebus queue create \
--resource-group myRG \
--namespace-name myNamespace \
--name order-processing \
--max-delivery-count 5 \
--lock-duration PT2M
# Update existing queue
az servicebus queue update \
--resource-group myRG \
--namespace-name myNamespace \
--name order-processing \
--max-delivery-count 3
Choosing the Right MaxDeliveryCount
| Scenario | Recommended Value | Reasoning |
|---|---|---|
| Idempotent operations | 3–5 | Quick failure detection, safe to retry |
| External API calls | 5–10 | Allow time for transient outages to resolve |
| Critical financial transactions | 1–2 | Fail fast, investigate manually |
| Batch processing | 10+ | Allow many retries before dead-lettering |
Client-Level Retry (SDK RetryOptions)
The Service Bus SDK includes built-in retry logic for transient communication errors (network timeouts, throttling, server busy). This is separate from the broker-level delivery count.
using Azure.Messaging.ServiceBus;
var retryOptions = new ServiceBusRetryOptions
{
Mode = ServiceBusRetryMode.Exponential,
MaxRetries = 5,
Delay = TimeSpan.FromSeconds(1), // Initial delay
MaxDelay = TimeSpan.FromSeconds(60), // Cap on delay
TryTimeout = TimeSpan.FromSeconds(30) // Timeout per attempt
};
var clientOptions = new ServiceBusClientOptions
{
RetryOptions = retryOptions
};
var client = new ServiceBusClient(connectionString, clientOptions);
Retry Modes
| Mode | Behavior | Best For |
|---|---|---|
Exponential | Delay doubles each retry: 1s → 2s → 4s → 8s... | Most scenarios — prevents thundering herd |
Fixed | Constant delay between retries | Predictable timing requirements |
Important: SDK retry handles communication failures (connecting to Service Bus). It does NOT retry your message processing logic. For that, you need application-level retry.
Application-Level Retry with Polly
For retrying your own processing logic (database calls, HTTP requests, etc.), use the Polly resilience library:
using Polly;
using Polly.Retry;
using Azure.Messaging.ServiceBus;
public class OrderProcessor
{
private readonly ServiceBusProcessor _processor;
private readonly AsyncRetryPolicy _retryPolicy;
private readonly ILogger<OrderProcessor> _logger;
public OrderProcessor(ServiceBusClient client, ILogger<OrderProcessor> logger)
{
_logger = logger;
// Define retry policy: 3 retries with exponential backoff
_retryPolicy = Policy
.Handle<HttpRequestException>()
.Or<TimeoutException>()
.Or<SqlException>(ex => IsTransient(ex))
.WaitAndRetryAsync(
retryCount: 3,
sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
onRetry: (exception, delay, attempt, context) =>
{
_logger.LogWarning(
"Retry {Attempt} after {Delay}s due to: {Error}",
attempt, delay.TotalSeconds, exception.Message);
});
_processor = client.CreateProcessor("order-processing");
_processor.ProcessMessageAsync += HandleMessageAsync;
_processor.ProcessErrorAsync += HandleErrorAsync;
}
private async Task HandleMessageAsync(ProcessMessageEventArgs args)
{
var order = args.Message.Body.ToObjectFromJson<Order>();
try
{
// Retry transient failures within this delivery attempt
await _retryPolicy.ExecuteAsync(async () =>
{
await ProcessOrderAsync(order);
});
// Success — complete the message
await args.CompleteMessageAsync(args.Message);
}
catch (Exception ex) when (IsTransient(ex))
{
// All retries exhausted but still transient — abandon for redelivery
_logger.LogWarning("Transient failure after retries, abandoning for redelivery: {Error}", ex.Message);
await args.AbandonMessageAsync(args.Message);
}
catch (Exception ex)
{
// Permanent failure — dead-letter immediately
_logger.LogError(ex, "Permanent failure processing order {OrderId}", order.Id);
await args.DeadLetterMessageAsync(args.Message,
deadLetterReason: "PermanentProcessingFailure",
deadLetterErrorDescription: ex.Message);
}
}
private static bool IsTransient(Exception ex) =>
ex is HttpRequestException or TimeoutException or SqlException { Number: 1205 or -2 or 40613 };
}
Distinguishing Transient vs Permanent Failures
This is the most critical decision in your retry logic. Retrying a permanent failure wastes resources and delays dead-lettering.
| Failure Type | Examples | Action |
|---|---|---|
| Transient | Network timeout, HTTP 429/503, SQL deadlock, connection reset | Retry with backoff |
| Permanent | Invalid message format, business rule violation, HTTP 400/404, missing required data | Dead-letter immediately |
| Unknown | Unexpected exceptions | Abandon (let delivery count handle it) |
Implementation Pattern
private async Task HandleMessageAsync(ProcessMessageEventArgs args)
{
try
{
await ProcessAsync(args.Message);
await args.CompleteMessageAsync(args.Message);
}
catch (ValidationException ex)
{
// Permanent: bad data, will never succeed
await args.DeadLetterMessageAsync(args.Message,
deadLetterReason: "ValidationFailed",
deadLetterErrorDescription: ex.Message);
}
catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
{
// Permanent: resource doesn't exist
await args.DeadLetterMessageAsync(args.Message,
deadLetterReason: "ResourceNotFound",
deadLetterErrorDescription: ex.Message);
}
catch (Exception ex) when (IsTransient(ex))
{
// Transient: abandon for redelivery
await args.AbandonMessageAsync(args.Message,
new Dictionary<string, object> { ["LastError"] = ex.Message });
}
catch (Exception ex)
{
// Unknown: log and abandon, let MaxDeliveryCount protect us
_logger.LogError(ex, "Unexpected error processing message {Id}", args.Message.MessageId);
await args.AbandonMessageAsync(args.Message);
}
}
Retry Configuration for Azure Functions
When using Service Bus triggers in Azure Functions, configure retry behavior in host.json:
{
"version": "2.0",
"extensions": {
"serviceBus": {
"messageHandlerOptions": {
"autoComplete": false,
"maxConcurrentCalls": 16,
"maxAutoLockRenewalDuration": "00:05:00"
},
"batchOptions": {
"maxMessageCount": 10,
"operationTimeout": "00:01:00"
}
}
},
"retry": {
"strategy": "exponentialBackoff",
"maxRetryCount": 3,
"minimumInterval": "00:00:02",
"maximumInterval": "00:00:30"
}
}
Note: The
retrysection inhost.jsoncontrols function-level retry (re-invoking the function). This is in addition to Service Bus delivery count. Be careful not to multiply retries unintentionally.
Function-Level Retry Attribute
[Function("ProcessOrder")]
[ExponentialBackoffRetry(3, "00:00:02", "00:00:30")]
public async Task Run(
[ServiceBusTrigger("order-processing", Connection = "ServiceBusConnection")]
ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions)
{
var order = message.Body.ToObjectFromJson<Order>();
await ProcessOrderAsync(order);
await messageActions.CompleteMessageAsync(message);
}
Dead-Letter Queue Processing
After all retries are exhausted, messages land in the dead-letter queue. You need a strategy to handle them:
public class DeadLetterProcessor
{
[Function("ProcessDeadLetters")]
public async Task Run(
[ServiceBusTrigger("order-processing/$deadletterqueue", Connection = "ServiceBusConnection")]
ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions)
{
var deadLetterReason = message.DeadLetterReason;
var errorDescription = message.DeadLetterErrorDescription;
var deliveryCount = message.DeliveryCount;
_logger.LogError(
"Dead-lettered message {Id}: Reason={Reason}, Error={Error}, Deliveries={Count}",
message.MessageId, deadLetterReason, errorDescription, deliveryCount);
// Store for investigation
await _alertService.NotifyDeadLetterAsync(message);
await messageActions.CompleteMessageAsync(message);
}
}
Real-World Retry Patterns
Pattern 1: Circuit Breaker + Retry
When a downstream service is completely down, retrying every message wastes resources. Combine retry with a circuit breaker:
var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync(
exceptionsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromMinutes(1));
var retryPolicy = Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)));
// Wrap retry inside circuit breaker
var combinedPolicy = Policy.WrapAsync(circuitBreakerPolicy, retryPolicy);
Pattern 2: Scheduled Retry with Deferred Messages
For failures that need longer delays between retries (e.g., waiting for an external system to recover):
catch (Exception ex) when (IsTransient(ex) && message.DeliveryCount < 3)
{
// Schedule retry with increasing delay
var delay = TimeSpan.FromMinutes(Math.Pow(5, message.DeliveryCount));
var retryMessage = new ServiceBusMessage(message.Body)
{
ScheduledEnqueueTime = DateTimeOffset.UtcNow.Add(delay),
ApplicationProperties = { ["OriginalMessageId"] = message.MessageId }
};
await sender.SendMessageAsync(retryMessage);
await args.CompleteMessageAsync(message);
}
Best Practices
-
Always set
autoComplete: false— Take explicit control over when messages are completed, abandoned, or dead-lettered. -
Classify failures early — Determine if a failure is transient or permanent as close to the error source as possible. Don't retry permanent failures.
-
Add jitter to backoff — Prevent thundering herd when many consumers retry simultaneously:
var jitter = TimeSpan.FromMilliseconds(Random.Shared.Next(0, 1000));
var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt)) + jitter;
-
Log delivery count — Always include
message.DeliveryCountin your logs to track retry progression. -
Set reasonable lock duration — Ensure your lock duration exceeds your total retry time within a single delivery. If the lock expires, the message is redelivered (incrementing delivery count unexpectedly).
-
Don't multiply retries — If you have 3 Polly retries × 5 MaxDeliveryCount × 3 function retries = 45 total attempts. Be intentional about each layer.
-
Use dead-letter reason and description — Always provide meaningful context when dead-lettering so operators can diagnose issues without reading code.
-
Monitor dead-letter queue depth — Set up alerts when the DLQ grows, indicating a systemic issue.
Common Pitfalls
| Pitfall | Impact | Solution |
|---|---|---|
| Retrying permanent failures | Delays dead-lettering, wastes resources | Classify errors and dead-letter immediately for permanent failures |
| Lock expiration during retry | Unexpected redelivery, duplicate processing | Increase lock duration or reduce retry count |
| No dead-letter monitoring | Failed messages accumulate silently | Alert on DLQ message count > 0 |
| Infinite retry loops | Message never reaches DLQ | Always have MaxDeliveryCount as a safety net |
| Completing before processing | Message lost on failure | Set autoComplete: false, complete explicitly after success |
Next Steps
- Dead-Letter Queue Processing — Strategies for handling failed messages
- Service Bus Sessions — Ordered message processing with retry
Azure Integration Hub - Intermediate Level