Implementing Reliable Messaging with Dead Letter Queues
Why Dead Letter Queues Matter
In a distributed system, messages fail. It's not a matter of "if" but "when." A message might fail because:
- Transient failures - Network hiccups, temporary service unavailability
- Validation errors - Message format is incorrect or missing required fields
- Business logic failures - Message is valid but violates business rules
- Poison messages - Malformed data that always causes processing failures
- Timeout issues - Message takes too long to process
Without a proper dead letter strategy, failed messages either:
- Get lost forever (data loss)
- Keep retrying infinitely (resource exhaustion)
- Block your entire processing pipeline
Dead Letter Queues (DLQs) solve this by providing a safe landing place for messages that can't be processed successfully.
Understanding Service Bus Dead Letter Mechanics
How Messages End Up in DLQ
┌─────────────────────────────────────────────────────────────────────────┐
│ Message Processing Flow │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ New Message │
│ arrives │
└────────┬────────┘
│
▼
┌──────────────────────────────┐
│ Process Message (Attempt 1) │
└──────────────┬───────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌────────────┐
│ Success │ │ Failed │ │ Exception │
│ │ │ │ │ Thrown │
└─────┬─────┘ └─────┬─────┘ └─────┬──────┘
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌────────────┐
│ Complete │ │ Abandon │ │ Deadletter │
│ Message │ │ Message │ │ (Max Del. │
└───────────┘ └───────────┘ │ Reached) │
└──────┬─────┘
│
▼
┌──────────────────┐
│ Dead Letter │
│ Queue │
│ ($DeadLetterQueue)
└──────────────────┘
Key Concepts
MaxDeliveryCount - The number of times Service Bus attempts to deliver a message before moving it to the DLQ. Default is 10.
Lock Duration - How long a message is locked for processing. Default is 30 seconds. If not completed within this time, the message becomes available again.
Session Id - Allows grouping related messages. Important for maintaining order in processing.
Step 1: Configuring Dead Letter Queues
Basic Queue Configuration
using Azure.Messaging.ServiceBus;
// Create a queue with proper DLQ settings
var queueOptions = new CreateQueueOptions("orders-queue")
{
// Maximum delivery attempts before dead-lettering
// Why 3? Balance between:
// - Enough attempts to handle transient failures
// - Not too many to cause long delays
MaxDeliveryCount = 3,
// How long message lives in dead-letter queue
// After this, it's permanently deleted
// Why 7 days? Time to investigate and reprocess
DefaultMessageTimeToLive = TimeSpan.FromDays(7),
// Lock duration - how long each delivery attempt gets
// Must be long enough for message processing
LockDuration = TimeSpan.FromMinutes(1),
// Enable dead-lettering for expired messages
DeadLetteringOnMessageExpiration = true,
// Enable duplicate detection (helps with reprocessing)
RequiresDuplicateDetection = true,
// Duplicate detection time window
DuplicateDetectionHistoryTimeWindow = TimeSpan.FromMinutes(5)
};
// Why use separate queues? Different handling for different message types
// Example: Orders require different processing than notifications
var queueClient = new QueueClient(connectionString, "orders-queue",
ReceiveMode.PeekLock,
new ServiceBusProcessorOptions
{
// Process messages concurrently
MaxConcurrentCalls = 10,
// Auto-complete successful messages
AutoComplete = false
});
Why These Settings Matter
// Let's understand each setting in detail
// MaxDeliveryCount: 3
// Why not 10 (default)?
// - Faster feedback: You know sooner if there's a problem
// - Lower latency: Failed messages don't block queue for long
// - Trade-off: May cause more false positives for transient errors
// - Mitigation: Use exponential backoff in retry logic
// LockDuration: 1 minute
// Why this value?
// - Too short: Message times out before processing completes
// - Too long: Other consumers wait longer for messages that might fail
// - Rule of thumb: Should be 2-3x your average processing time
// - Monitor: Check if you're seeing many Abandoned messages
// DefaultMessageTimeToLive: 7 days
// Why this value?
// - Enough time to investigate DLQ without rush
// - Not so long that storage costs accumulate
// - Consider: How quickly do you need to respond to issues?
// - Important: If message expires before you process it, it's gone forever
// DeadLetteringOnMessageExpiration: true
// Why this matters:
// - Expired messages often indicate a processing problem
// - Dead letter queue lets you investigate why they expired
// - Without this: Expired messages just disappear (no audit trail)
Topic Subscription Configuration
// For subscriptions, dead-lettering is configured at subscription level
var subscriptionOptions = new CreateSubscriptionOptions(
"orders-topic",
"northamerica-orders"
)
{
// Messages without matching filters go here
DeadLetteringOnMessageExpiration = true,
// Lock duration for subscription messages
LockDuration = TimeSpan.FromMinutes(2),
// Time-to-live for messages in subscription
DefaultMessageTimeToLive = TimeSpan.FromDays(7),
// Forward dead letters to a specific queue
ForwardDeadLetteredMessagesTo = "orders-dlq-investigation"
};
// Delete messages from DLQ after X days
var ruleOptions = new CreateRuleOptions("dead-letter-rule")
{
Filter = new SqlRuleFilter("sys.Label = 'dead-letter'"),
Action = new SqlRuleAction("SET sys.ScheduledEnqueueTimeUtc = DATEADD(day, 7, sys.ScheduledEnqueueTimeUtc)")
};
Step 2: Implementing Retry Logic
Exponential Backoff Pattern
public class ServiceBusMessageProcessor
{
private readonly ServiceBusProcessor _processor;
private readonly int _maxRetries = 3;
private readonly ILogger<ServiceBusMessageProcessor> _logger;
public ServiceBusMessageProcessor(
ServiceBusClient client,
ILogger<ServiceBusMessageProcessor> logger)
{
_logger = logger;
// Create processor with custom options
_processor = client.CreateProcessor(
"orders-queue",
new ServiceBusProcessorOptions
{
MaxConcurrentCalls = 10,
AutoComplete = false,
MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(5)
});
// Set up message handler
_processor.ProcessMessageAsync += ProcessMessageWithRetry;
_processor.ProcessErrorAsync += HandleError;
}
private async Task ProcessMessageWithRetry(ProcessMessageEventArgs args)
{
var message = args.Message;
var retryCount = 0;
var maxRetries = _maxRetries;
// Calculate retry delay with exponential backoff
// Base delay: 1 second
// Formula: delay = baseDelay * (2 ^ retryCount) + random jitter
// Example: 1s, 2s, 4s, 8s...
while (retryCount < maxRetries)
{
try
{
// Process the message
await ProcessOrderMessageAsync(message);
// Success! Complete the message
await args.CompleteMessageAsync(message);
_logger.LogInformation("Message {MessageId} processed successfully",
message.MessageId);
return;
}
catch (ValidationException ex)
{
// Validation errors are permanent - don't retry
_logger.LogError(ex, "Message validation failed for {MessageId}. " +
"Moving to DLQ.", message.MessageId);
// Dead-letter immediately - no point retrying invalid data
await args.DeadLetterAsync(message, ex.Message);
return;
}
catch (TransientException ex)
{
retryCount++;
_logger.LogWarning(ex, "Transient error processing message {MessageId}. " +
"Retry {RetryNumber} of {MaxRetries}.",
message.MessageId, retryCount, maxRetries);
if (retryCount >= maxRetries)
{
// All retries exhausted - dead letter
_logger.LogError("Message {MessageId} failed after {MaxRetries} retries. " +
"Moving to DLQ.", message.MessageId, maxRetries);
await args.DeadLetterAsync(message,
$"Failed after {maxRetries} retries. Last error: {ex.Message}");
return;
}
// Wait before retry with exponential backoff
var delay = CalculateBackoffDelay(retryCount);
_logger.LogInformation("Waiting {Delay}ms before retry", delay);
await Task.Delay(delay);
}
catch (Exception ex)
{
// Unexpected errors - dead letter immediately for investigation
_logger.LogError(ex, "Unexpected error processing message {MessageId}",
message.MessageId);
await args.DeadLetterAsync(message,
$"Unexpected error: {ex.GetType().Name} - {ex.Message}");
return;
}
}
}
private TimeSpan CalculateBackoffDelay(int retryCount)
{
// Exponential backoff: 1s, 2s, 4s, 8s...
var baseDelay = TimeSpan.FromSeconds(1);
var exponentialDelay = baseDelay * Math.Pow(2, retryCount - 1);
// Add jitter (0-1 second) to prevent thundering herd
var jitter = TimeSpan.FromMilliseconds(new Random().Next(0, 1000));
// Cap at 30 seconds maximum
return exponentialDelay + jitter > TimeSpan.FromSeconds(30)
? TimeSpan.FromSeconds(30)
: exponentialDelay + jitter;
}
private Task HandleError(ProcessErrorEventArgs args)
{
_logger.LogError(args.Exception,
"Error processing message. Entity: {Entity}, ErrorSource: {ErrorSource}",
args.EntityPath, args.ErrorSource);
return Task.CompletedTask;
}
private async Task ProcessOrderMessageAsync(ServiceBusMessage message)
{
// Your actual message processing logic
// This could throw TransientException for recoverable errors
// or other exceptions for non-recoverable errors
var body = JsonSerializer.Deserialize<Order>(message.Body);
// Validate the order
if (string.IsNullOrEmpty(body.CustomerId))
throw new ValidationException("CustomerId is required");
if (body.Items == null || !body.Items.Any())
throw new ValidationException("Order must have at least one item");
// Process the order (might fail with transient errors)
await _orderService.CreateOrderAsync(body);
}
}
// Custom exception types to distinguish error types
public class TransientException : Exception
{
public TransientException(string message) : base(message) { }
public TransientException(string message, Exception inner) : base(message, inner) { }
}
public class ValidationException : Exception
{
public ValidationException(string message) : base(message) { }
}
Circuit Breaker Pattern
public class CircuitBreakerServiceBusProcessor
{
private CircuitBreaker _circuitBreaker;
private readonly ServiceBusProcessor _processor;
public CircuitBreakerServiceBusProcessor(ServiceBusClient client)
{
_circuitBreaker = new CircuitBreaker(
failureThreshold: 5, // Open after 5 failures
timeout: TimeSpan.FromSeconds(30), // Try again after 30s
samplingDuration: TimeSpan.FromSeconds(60)); // Monitor for 60s
_processor = client.CreateProcessor("orders-queue");
_processor.ProcessMessageAsync += ProcessWithCircuitBreaker;
}
private async Task ProcessWithCircuitBreaker(ProcessMessageEventArgs args)
{
var message = args.Message;
try
{
// Check if circuit is open
if (!_circuitBreaker.CanExecute)
{
_logger.LogWarning("Circuit breaker is open. Deferring message.");
// Defer the message - try again later
await args.DeferAsync(message, new Dictionary<string, object>
{
["deferred-at"] = DateTime.UtcNow,
["retry-count"] = 0
});
return;
}
// Try to process
await ProcessMessageAsync(message);
// Success - record success in circuit breaker
_circuitBreaker.RecordSuccess();
await args.CompleteMessageAsync(message);
}
catch (Exception ex)
{
// Failure - record failure in circuit breaker
_circuitBreaker.RecordFailure(ex);
// Handle based on circuit state
if (_circuitBreaker.IsOpen)
{
// Circuit is now open - defer message
await args.DeferAsync(message);
}
else
{
// Not at threshold yet - try again normally
throw; // Will be retried by Service Bus
}
}
}
}
// Simple circuit breaker implementation
public class CircuitBreaker
{
private readonly int _failureThreshold;
private readonly TimeSpan _timeout;
private readonly TimeSpan _samplingDuration;
private int _failureCount;
private DateTime _lastFailureTime;
private CircuitState _state = CircuitState.Closed;
public bool CanExecute => _state != CircuitState.Open;
public bool IsOpen => _state == CircuitState.Open;
public CircuitBreaker(int failureThreshold, TimeSpan timeout, TimeSpan samplingDuration)
{
_failureThreshold = failureThreshold;
_timeout = timeout;
_samplingDuration = samplingDuration;
}
public void RecordSuccess()
{
_failureCount = 0;
_state = CircuitState.Closed;
}
public void RecordFailure(Exception ex)
{
_failureCount++;
_lastFailureTime = DateTime.UtcNow;
if (_failureCount >= _failureThreshold)
{
_state = CircuitState.Open;
// Schedule reset after timeout
Task.Delay(_timeout).ContinueWith(_ => ResetCircuit());
}
}
private void ResetCircuit()
{
_failureCount = 0;
_state = CircuitState.Closed;
}
}
public enum CircuitState
{
Closed,
Open,
HalfOpen
}
Step 3: Processing Dead Letter Messages
Reading and Analyzing DLQ Messages
public class DeadLetterProcessor
{
private readonly ServiceBusClient _client;
private readonly ServiceBusReceiver _dlqReceiver;
private readonly ILogger<DeadLetterProcessor> _logger;
public DeadLetterProcessor(ServiceBusClient client)
{
_client = client;
// Create a receiver specifically for the dead letter queue
// The dead letter queue is accessed via "$dead-letter-queue" sub-queue
_dlqReceiver = client.CreateReceiver(
"orders-queue",
new ServiceBusReceiverOptions
{
SubQueue = SubQueue.DeadLetter,
ReceiveMode = ServiceBusReceiveMode.PeekLock
});
}
public async Task<List<DeadLetterInfo>> PeekDeadLettersAsync(int maxMessages = 100)
{
var deadLetters = new List<DeadLetterInfo>();
// Peek messages without locking them
var messages = await _dlqReceiver.PeekMessagesAsync(maxMessages);
foreach (var message in messages)
{
var dlqInfo = new DeadLetterInfo
{
MessageId = message.MessageId,
SequenceNumber = message.SequenceNumber,
EnqueuedTime = message.EnqueuedTime,
Subject = message.Subject,
Body = message.Body.ToString(),
// Dead letter reason and error description
DeadLetterReason = message.DeadLetterReason,
DeadLetterErrorDescription = message.DeadLetterErrorDescription,
// Original properties
Properties = message.ApplicationProperties.ToDictionary(
kvp => kvp.Key.ToString(),
kvp => kvp.Value?.ToString() ?? "null"
),
// Delivery count - how many attempts before DLQ
DeliveryCount = message.DeliveryCount
};
deadLetters.Add(dlqInfo);
}
return deadLetters;
}
public async Task ProcessDeadLettersAsync()
{
// Process all messages in DLQ
while (true)
{
var messages = await _dlqReceiver.ReceiveMessagesAsync(maxMessages: 10,
maxWaitTime: TimeSpan.FromSeconds(5));
if (messages.Count == 0) break;
foreach (var message in messages)
{
try
{
await ProcessSingleDeadLetterAsync(message);
await _dlqReceiver.CompleteMessageAsync(message);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process dead letter {MessageId}",
message.MessageId);
// Abandon - will be retried
await _dlqReceiver.AbandonMessageAsync(message);
}
}
}
private async Task ProcessSingleDeadLetterAsync(ServiceBusMessage message)
{
_logger.LogInformation("Processing dead letter {MessageId}, Reason: {Reason}",
message.MessageId, message.DeadLetterReason);
var reason = message.DeadLetterReason?.ToLowerInvariant() ?? "";
switch (reason)
{
case "messagesexceededmaxdeliverycount":
await HandleMaxDeliveryExceededAsync(message);
break;
case "messagettexceeded":
await HandleTTLExceededAsync(message);
break;
case "messagerejected":
await HandleMessageRejectedAsync(message);
break;
case "sessionidnull":
await HandleSessionRequiredAsync(message);
break;
default:
await HandleUnknownFailureAsync(message);
break;
}
}
private async Task HandleMaxDeliveryExceededAsync(ServiceBusMessage message)
{
// Most common case - message failed repeatedly
_logger.LogWarning("Message {MessageId} exceeded max delivery count",
message.MessageId);
var body = JsonSerializer.Deserialize<Order>(message.Body);
// Check if message is still processable
// For example, maybe the order is still valid
var orderAge = DateTime.UtcNow - message.EnqueuedTime;
if (orderAge > TimeSpan.FromDays(7))
{
// Message is too old, archive it
await ArchiveDeadLetterAsync(message, "expired-order");
_logger.LogWarning("Message {MessageId} is too old, archiving",
message.MessageId);
return;
}
// Try to reprocess
var result = await TryReprocessAsync(body);
if (result.Success)
{
_logger.LogInformation("Successfully reprocessed message {MessageId}",
message.MessageId);
await SendToOriginalQueueAsync(body);
}
else
{
// Still failing - might be poison message
await ArchiveDeadLetterAsync(message, "poison-message");
}
}
private async Task HandleTTLExceededAsync(ServiceBusMessage message)
{
_logger.LogWarning("Message {MessageId} exceeded TTL", message.MessageId);
// Analyze why TTL was exceeded
var processingDelay = message.EnqueuedTime - message.ScheduledEnqueueTime;
if (processingDelay > TimeSpan.FromDays(1))
{
// Message was queued for over a day before processing
_logger.LogWarning("Message was queued for {Delay} before processing",
processingDelay);
}
// Archive for analysis
await ArchiveDeadLetterAsync(message, "ttl-exceeded");
}
private async Task HandleMessageRejectedAsync(ServiceBusMessage message)
{
// Message was explicitly rejected by consumer
_logger.LogWarning("Message {MessageId} was explicitly rejected: {Description}",
message.MessageId, message.DeadLetterErrorDescription);
// Log for review
await ArchiveDeadLetterAsync(message, "rejected");
}
private async Task HandleSessionRequiredAsync(ServiceBusMessage message)
{
// Queue requires sessions but message had none
_logger.LogWarning("Message {MessageId} missing session ID");
// Add session ID and retry
var body = JsonSerializer.Deserialize<Order>(message.Body);
body.SessionId = body.CustomerId; // Use customer ID as session
await SendToOriginalQueueAsync(body);
}
private async Task HandleUnknownFailureAsync(ServiceBusMessage message)
{
_logger.LogError("Unknown dead letter reason for message {MessageId}: {Reason}",
message.MessageId, message.DeadLetterReason);
// Archive for manual review
await ArchiveDeadLetterAsync(message, "unknown");
}
}
// Helper classes
public class DeadLetterInfo
{
public string MessageId { get; set; }
public long SequenceNumber { get; set; }
public DateTimeOffset EnqueuedTime { get; set; }
public string Subject { get; set; }
public string Body { get; set; }
public string DeadLetterReason { get; set; }
public string DeadLetterErrorDescription { get; set; }
public Dictionary<string, string> Properties { get; set; }
public int DeliveryCount { get; set; }
}
Automatic Reprocessing Pattern
public class DeadLetterAutoReprocessor
{
private readonly ServiceBusClient _client;
private readonly ServiceBusSender _ordersQueueSender;
private readonly BlobContainerClient _archiveContainer;
private readonly ILogger<DeadLetterAutoReprocessor> _logger;
public DeadLetterAutoReprocessor(ServiceBusClient client,
BlobContainerClient archiveContainer)
{
_client = client;
_ordersQueueSender = client.CreateSender("orders-queue");
_archiveContainer = archiveContainer;
}
public async Task<ReprocessResult> TryReprocessAsync(ServiceBusMessage dlqMessage,
int maxAttempts = 3)
{
for (int attempt = 1; attempt <= maxAttempts; attempt++)
{
try
{
// Parse the message body
var order = JsonSerializer.Deserialize<Order>(dlqMessage.Body);
// Attempt to process
await ProcessOrderWithValidationAsync(order);
_logger.LogInformation("Successfully reprocessed message {MessageId} " +
"on attempt {Attempt}", dlqMessage.MessageId, attempt);
return new ReprocessResult { Success = true, Attempts = attempt };
}
catch (ValidationException ex)
{
// Validation failed - won't succeed on retry
_logger.LogError(ex, "Reprocess validation failed for {MessageId}, " +
"giving up after {Attempts} attempts", dlqMessage.MessageId, attempt);
return new ReprocessResult
{
Success = false,
Attempts = attempt,
Reason = ex.Message,
ShouldArchive = true
};
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Reprocess attempt {Attempt} failed for {MessageId}",
attempt, dlqMessage.MessageId);
if (attempt < maxAttempts)
{
var delay = TimeSpan.FromSeconds(Math.Pow(2, attempt));
await Task.Delay(delay);
}
}
}
return new ReprocessResult
{
Success = false,
Attempts = maxAttempts,
ShouldArchive = true
};
}
private async Task ProcessOrderWithValidationAsync(Order order)
{
// Re-validate the order before processing
if (order == null)
throw new ValidationException("Order is null");
if (string.IsNullOrEmpty(order.OrderId))
throw new ValidationException("OrderId is required");
if (string.IsNullOrEmpty(order.CustomerId))
throw new ValidationException("CustomerId is required");
if (order.Items == null || order.Items.Count == 0)
throw new ValidationException("Order must have items");
// Validate each item
foreach (var item in order.Items)
{
if (string.IsNullOrEmpty(item.ProductId))
throw new ValidationException($"Item {item.LineNumber}: ProductId is required");
if (item.Quantity <= 0)
throw new ValidationException($"Item {item.LineNumber}: Quantity must be positive");
}
// Process via a fresh service instance (avoid stale state)
// In production, you'd call your actual order service
await Task.Delay(100); // Simulate processing
}
private async Task ArchiveDeadLetterAsync(ServiceBusMessage message,
string archiveReason)
{
// Save to blob storage for later analysis
var archiveFileName = $"dlq-archive/{DateTime.UtcNow:yyyy-MM}/{message.MessageId}.json";
var archiveData = new
{
originalMessageId = message.MessageId,
enqueuedTime = message.EnqueuedTime,
deadLetterReason = message.DeadLetterReason,
deadLetterErrorDescription = message.DeadLetterErrorDescription,
deliveryCount = message.DeliveryCount,
body = message.Body.ToString(),
properties = message.ApplicationProperties,
archivedAt = DateTime.UtcNow,
archiveReason
};
var json = JsonSerializer.Serialize(archiveData, new JsonSerializerOptions
{
WriteIndented = true
});
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(json));
await _archiveContainer.UploadBlobAsync(archiveFileName, stream);
_logger.LogInformation("Archived dead letter {MessageId} to {FileName}",
message.MessageId, archiveFileName);
}
}
public class ReprocessResult
{
public bool Success { get; set; }
public int Attempts { get; set; }
public string Reason { get; set; }
public bool ShouldArchive { get; set; }
}
Step 4: Monitoring and Alerts
Setting Up DLQ Monitoring
public class DeadLetterQueueMonitor
{
private readonly ServiceBusAdministrationClient _adminClient;
private readonly ILogger<DeadLetterQueueMonitor> _logger;
private readonly string _queueName;
public DeadLetterQueueMonitor(
ServiceBusAdministrationClient adminClient,
string queueName)
{
_adminClient = adminClient;
_queueName = queueName;
_logger = logger;
}
public async Task<DeadLetterMetrics> GetMetricsAsync()
{
var queueProperties = await _adminClient.GetQueueAsync(_queueName);
// Get dead letter queue stats
var dlqPath = EntityNameFormatter.FormatDeadLetterQueuePath(_queueName);
var dlqProperties = await _adminClient.GetQueueAsync(dlqPath);
return new DeadLetterMetrics
{
ActiveMessages = queueProperties.Value.ActiveMessageCount,
DeadLetterCount = dlqProperties.Value.ActiveMessageCount,
SizeInBytes = dlqProperties.Value.SizeInBytes,
CreatedAt = dlqProperties.Value.CreatedAt,
UpdatedAt = dlqProperties.Value.UpdatedAt
};
}
public async Task<bool> CheckThresholdExceededAsync(int warningThreshold = 10)
{
var metrics = await GetMetricsAsync();
if (metrics.DeadLetterCount > warningThreshold)
{
_logger.LogWarning(
"Dead letter queue has {Count} messages (threshold: {Threshold}). " +
"Immediate attention required.",
metrics.DeadLetterCount, warningThreshold);
return true;
}
return false;
}
public async Task SendAlertIfNeededAsync(int warningThreshold = 10,
int criticalThreshold = 100)
{
var metrics = await GetMetricsAsync();
if (metrics.DeadLetterCount > criticalThreshold)
{
// Critical: Many messages in DLQ
await SendCriticalAlertAsync(metrics);
}
else if (metrics.DeadLetterCount > warningThreshold)
{
// Warning: Elevated DLQ count
await SendWarningAlertAsync(metrics);
}
}
private async Task SendCriticalAlertAsync(DeadLetterMetrics metrics)
{
// Send to your monitoring system (Azure Monitor, PagerDuty, etc.)
_logger.LogCritical(
"CRITICAL: Dead letter queue for {Queue} has {Count} messages. " +
"Queue size: {Size}MB. Investigation required immediately.",
_queueName, metrics.DeadLetterCount, metrics.SizeInBytes / (1024 * 1024));
// Example: Send to Azure Monitor
// await _monitorClient.LogAlertAsync("Critical DLQ Alert", metrics);
}
private async Task SendWarningAlertAsync(DeadLetterMetrics metrics)
{
_logger.LogWarning(
"WARNING: Dead letter queue for {Queue} has {Count} messages. " +
"Please investigate.",
_queueName, metrics.DeadLetterCount);
}
}
public class DeadLetterMetrics
{
public long ActiveMessages { get; set; }
public long DeadLetterCount { get; set; }
public long SizeInBytes { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public DateTimeOffset UpdatedAt { get; set; }
}
Step 5: Best Practices and Patterns
Complete Implementation Example
// Putting it all together - Production-ready message processor
public class ProductionMessageProcessor : IAsyncDisposable
{
private readonly ServiceBusClient _client;
private readonly ServiceBusProcessor _processor;
private readonly ServiceBusProcessor _dlqProcessor;
private readonly ILogger<ProductionMessageProcessor> _logger;
private readonly DeadLetterQueueMonitor _monitor;
public ProductionMessageProcessor(
ServiceBusClient client,
ServiceBusAdministrationClient adminClient,
ILogger<ProductionMessageProcessor> logger)
{
_client = client;
_logger = logger;
// Configure main processor
_processor = client.CreateProcessor(
"orders-queue",
new ServiceBusProcessorOptions
{
MaxConcurrentCalls = 10,
AutoComplete = false,
PrefetchCount = 20
});
// Configure DLQ processor
_dlqProcessor = client.CreateProcessor(
"orders-queue",
new ServiceBusProcessorOptions
{
SubQueue = SubQueue.DeadLetter,
MaxConcurrentCalls = 5,
AutoComplete = false
});
// Set up handlers
_processor.ProcessMessageAsync += HandleMessageAsync;
_processor.ProcessErrorAsync += HandleErrorAsync;
_dlqProcessor.ProcessMessageAsync += HandleDeadLetterAsync;
_dlqProcessor.ProcessErrorAsync += HandleErrorAsync;
_monitor = new DeadLetterQueueMonitor(adminClient, "orders-queue");
}
public async Task StartProcessingAsync()
{
await _processor.StartProcessingAsync();
await _dlqProcessor.StartProcessingAsync();
_logger.LogInformation("Message processors started");
// Start background monitoring
_ = MonitorDLQAsync();
}
private async Task HandleMessageAsync(ProcessMessageEventArgs args)
{
var message = args.Message;
var processingStart = DateTime.UtcNow;
try
{
// 1. Log incoming message
_logger.LogInformation("Processing message {MessageId}, Sequence: {Sequence}",
message.MessageId, message.SequenceNumber);
// 2. Validate message
var order = ValidateAndParseOrder(message);
// 3. Process order
await ProcessOrderAsync(order);
// 4. Complete message
await args.CompleteMessageAsync(message);
var processingTime = DateTime.UtcNow - processingStart;
_logger.LogInformation(
"Message {MessageId} processed successfully in {Time}ms",
message.MessageId, processingTime.TotalMilliseconds);
}
catch (ValidationException ex)
{
// Permanent failure - dead letter immediately
_logger.LogError(ex, "Validation failed for {MessageId}, dead-lettering",
message.MessageId);
await args.DeadLetterAsync(message, new Dictionary<string, object>
{
["reason"] = "validation-failed",
["error"] = ex.Message
});
}
catch (Exception ex) when (IsTransient(ex))
{
// Transient failure - let Service Bus handle retry
var deliveryCount = message.DeliveryCount;
if (deliveryCount >= 3)
{
// Too many attempts - dead letter
_logger.LogError(ex, "Message {MessageId} failed after {Count} attempts",
message.MessageId, deliveryCount);
await args.DeadLetterAsync(message, new Dictionary<string, object>
{
["reason"] = "max-retries-exceeded",
["last-error"] = ex.Message
});
}
else
{
// Abandon - will be retried
_logger.LogWarning(ex, "Transient error on {MessageId}, attempt {Count}",
message.MessageId, deliveryCount);
await args.AbandonAsync(message);
}
}
catch (Exception ex)
{
// Unexpected error - dead letter for investigation
_logger.LogError(ex, "Unexpected error processing {MessageId}", message.MessageId);
await args.DeadLetterAsync(message, new Dictionary<string, object>
{
["reason"] = "unexpected-error",
["error-type"] = ex.GetType().Name,
["error"] = ex.Message
});
}
}
private async Task HandleDeadLetterAsync(ProcessMessageEventArgs args)
{
var dlqMessage = args.Message;
_logger.LogWarning(
"Processing dead letter {MessageId}, Reason: {Reason}, Error: {Error}",
dlqMessage.MessageId,
dlqMessage.DeadLetterReason,
dlqMessage.DeadLetterErrorDescription);
try
{
// Attempt to reprocess
var result = await TryReprocessAsync(dlqMessage);
if (result.Success)
{
// Reprocessed successfully - send back to main queue
var order = JsonSerializer.Deserialize<Order>(dlqMessage.Body);
await SendToQueueAsync(order);
await args.CompleteMessageAsync(dlqMessage);
_logger.LogInformation("Successfully reprocessed DLQ message {MessageId}",
dlqMessage.MessageId);
}
else
{
// Still failing - archive and complete
await ArchiveToStorageAsync(dlqMessage, result.Reason);
await args.CompleteMessageAsync(dlqMessage);
}
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing DLQ message {MessageId}",
dlqMessage.MessageId);
// Try to archive before abandoning
await ArchiveToStorageAsync(dlqMessage, "processing-error");
await args.AbandonAsync(dlqMessage);
}
}
private async Task MonitorDLQAsync()
{
while (true)
{
try
{
await _monitor.SendAlertIfNeededAsync(10, 100);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error monitoring DLQ");
}
await Task.Delay(TimeSpan.FromMinutes(5));
}
}
// Helper methods
private Order ValidateAndParseOrder(ServiceBusMessage message)
{
var order = JsonSerializer.Deserialize<Order>(message.Body);
if (string.IsNullOrEmpty(order?.OrderId))
throw new ValidationException("OrderId is required");
return order;
}
private async Task ProcessOrderAsync(Order order)
{
// Your business logic here
await Task.Delay(100); // Simulate processing
}
private bool IsTransient(Exception ex)
{
return ex is TimeoutException ||
ex is ServiceBusFailureException ||
(ex as ServiceBusException)?.IsTransient == true;
}
public async ValueTask DisposeAsync()
{
await _processor.StopProcessingAsync();
await _dlqProcessor.StopProcessingAsync();
await _processor.DisposeAsync();
await _dlqProcessor.DisposeAsync();
}
}
Best Practices Summary
| Practice | Why | Implementation |
|---|---|---|
| Set MaxDeliveryCount to 3-5 | Balance between retry and quick failure detection | Don't use default (10) |
| Use exponential backoff | Prevents overwhelming failed services | 1s, 2s, 4s, 8s pattern |
| Dead-letter validation errors immediately | No point retrying invalid data | Check exception type |
| Monitor DLQ depth | Early warning of issues | Azure Monitor alerts |
| Archive old DLQ messages | Prevents infinite storage growth | Auto-archive after 7 days |
| Implement circuit breaker | Prevents cascade failures | CircuitBreaker pattern |
| Use message properties for tracking | Better debugging | Add retry count, errors |
Testing Your DLQ Implementation
// Integration tests for dead letter handling
public class DeadLetterQueueTests
{
private readonly ServiceBusClient _client;
private readonly string _testQueue;
[Fact]
public async Task Message_ExceedingMaxDelivery_GoesToDLQ()
{
// Arrange
var sender = _client.CreateSender(_testQueue);
var message = new ServiceBusMessage("test-body")
{
ApplicationProperties = { ["test"] = "true" }
};
// Act - send message that will fail
await sender.SendMessageAsync(message);
// Simulate max delivery attempts
var receiver = _client.CreateReceiver(_testQueue);
var received = await receiver.ReceiveMessageAsync();
// Abandon (don't complete) 3+ times to trigger DLQ
for (int i = 0; i < 3; i++)
{
await receiver.AbandonMessageAsync(received);
received = await receiver.ReceiveMessageAsync(TimeSpan.FromSeconds(5));
}
// Assert - check DLQ
var dlqReceiver = _client.CreateReceiver(_testQueue,
new ServiceBusReceiverOptions { SubQueue = SubQueue.DeadLetter });
var dlqMessage = await dlqReceiver.PeekMessageAsync();
Assert.NotNull(dlqMessage);
Assert.Equal("messagesexceededmaxdeliverycount", dlqMessage.DeadLetterReason);
}
[Fact]
public async Task ValidationError_DeadLettersImmediately()
{
// Arrange
var sender = _client.CreateSender(_testQueue);
// Invalid message (missing required field)
var message = new ServiceBusMessage("{}");
// Act
await sender.SendMessageAsync(message);
// Wait for processing
await Task.Delay(TimeSpan.FromSeconds(2));
// Assert - DLQ should have the message
var dlqReceiver = _client.CreateReceiver(_testQueue,
new ServiceBusReceiverOptions { SubQueue = SubQueue.DeadLetter });
var dlqMessage = await dlqReceiver.PeekMessageAsync();
Assert.NotNull(dlqMessage);
}
}
Conclusion
Dead Letter Queues are essential for building reliable messaging systems. Key takeaways:
- Configure appropriately - Set MaxDeliveryCount to 3-5, not the default 10
- Implement smart retry - Use exponential backoff, not fixed delays
- Distinguish error types - Validation errors should skip retries
- Monitor actively - Set up alerts for DLQ depth thresholds
- Process systematically - Don't let messages rot in DLQ forever
- Test thoroughly - Verify your failure handling works
With these patterns, your messaging system will handle failures gracefully while giving you visibility into what's going wrong.
Azure Integration Hub - Service Bus