Transactional Outbox at Enterprise Scale
Reliable Event Publishing for Distributed Systems
Introduction
The Transactional Outbox pattern is a critical pattern for ensuring reliable event-driven communication in distributed systems. When you need to both update a database and publish an event (like notifying other services of that change), doing so in a single transaction isn't possible across different systems. The outbox pattern solves this by writing events to a database "outbox" table as part of the same transaction as the business data, then having a separate process publish those events reliably.
This comprehensive guide covers:
- Pattern fundamentals — Understanding outbox pattern
- Implementation approaches — Database-driven vs. log-based
- Azure implementation — Cosmos DB, SQL, Event Hub
- Scaling considerations — Enterprise-level performance
- Monitoring and recovery — Ensuring reliability
Understanding the Pattern
The Problem It Solves
┌─────────────────────────────────────────────────────────────────────┐
│ THE TRANSACTION PROBLEM │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Naive Approach (Two Separate Operations): │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Update │ │ Publish │ │
│ │ Database │ │ Event │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Success │ │ Failed │ │
│ │ │ │ │ │
│ │ Order │ │ No event │ │
│ │ updated │ │ published │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ Problem: INCONSISTENCY - Database updated, but event never │
│ published. Other systems don't know about the change. │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ Transactional Outbox Solution: │
│ │
│ ┌────────────────────────────┐ │
│ │ SINGLE TRANSACTION │ │
│ │ ┌──────────┐ ┌───────────┐│ │
│ │ │ Update │ │ Write to ││ │
│ │ │ Order │ │ Outbox ││ │
│ │ │ Table │ │ Table ││ │
│ │ └──────────┘ └───────────┘│ │
│ └────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ | Both: Order │ │
│ |updated AND │ │
│ |event written│ │
│ └─────────────┘ │
│ │
│ ✓ Atomicity: Both succeed or both fail │
│ ✓ Reliability: Events can be processed later if publisher fails │
│ │
└─────────────────────────────────────────────────────────────────────┘
How the Pattern Works
┌─────────────────────────────────────────────────────────────────────┐
│ OUTBOX PATTERN ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ APPLICATION │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ DATABASE (ACID) │ │ │
│ │ │ │ │ │
│ │ │ ┌────────────────┐ ┌────────────────────────┐ │ │ │
│ │ │ │ Orders Table │ │ Outbox Table │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ id: ORD-123 │ │ id: uuid │ │ │ │
│ │ │ │ status: Shipped│ │ eventType: OrderShipped│ │ │ │
│ │ │ │ ... │ │ payload: {...} │ │ │ │
│ │ │ └────────────────┘ │ processed: false │ │ │ │
│ │ │ │ createdAt: ... │ │ │ │
│ │ │ └────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ Single Transaction │ │
│ │ │ │ │
│ └──────────────────────┼─────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────┼──────────────────────────────────────┐ │
│ │ ▼ │ │
│ │ ┌───────────────────────────────────────────────────────┐ │ │
│ │ │ OUTBOX PROCESSOR │ │ │
│ │ │ │ │ │
│ │ │ 1. Read unprocessed events │ │ │
│ │ │ 2. Publish to message broker │ │ │
│ │ │ 3. Mark as processed │ │ │
│ │ │ 4. Repeat │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ Published Events │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Subscriber │ │ Subscriber │ │ Subscriber │ │ │
│ │ │ A │ │ B │ │ C │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Azure Implementation
Cosmos DB Implementation
public class CosmosOutboxProcessor
{
private readonly Container _outboxContainer;
private readonly Container _ordersContainer;
private readonly ServiceBusClient _serviceBusClient;
private readonly ILogger<CosmosOutboxProcessor> _logger;
public async Task ProcessOutboxAsync(CancellationToken ct)
{
var query = new QueryDefinition(
"SELECT * FROM c WHERE c.processed = false ORDER BY c.createdAt");
var events = new List<OutboxEvent>();
await foreach (var item in _outboxContainer
.GetItemQueryIterator<OutboxEvent>(query))
{
events.Add(item);
}
foreach (var @event in events)
{
try
{
// Publish to Service Bus
var sender = _serviceBusClient.CreateSender(@event.TopicName);
var message = new ServiceBusMessage(@event.Payload)
{
ContentType = "application/json",
MessageId = @event.Id,
CorrelationId = @event.CorrelationId,
Subject = @event.EventType
};
await sender.SendMessageAsync(message, ct);
// Mark as processed
@event.ProcessedAt = DateTime.UtcNow;
@event.Processed = true;
await _outboxContainer.ReplaceItemAsync(
@event.Id, @event, new ItemRequestOptions(), ct);
_logger.LogInformation(
"Published event {EventId} of type {EventType}",
@event.Id, @event.EventType);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Failed to process event {EventId}", @event.Id);
@event.RetryCount++;
@event.LastError = ex.Message;
await _outboxContainer.ReplaceItemAsync(
@event.Id, @event, new ItemRequestOptions(), ct);
}
}
}
}
// Writing to outbox (same transaction as business data)
public async Task ShipOrderAsync(string orderId, string trackingNumber)
{
var transactions = new List<TransactionalBatchOperation>();
// Update order
var orderUpdate = new { Status = "Shipped", TrackingNumber = trackingNumber };
transactions.Add(_ordersContainer.CreateTransactionalBatchItem(
orderId,
JsonSerializer.Serialize(orderUpdate),
new ItemRequestOptions { EnableContentResponseOnWrite = false }));
// Write to outbox
var outboxEvent = new OutboxEvent
{
Id = Guid.NewGuid().ToString(),
EventType = "OrderShipped",
TopicName = "order-events",
Payload = JsonSerializer.Serialize(new
{
OrderId = orderId,
TrackingNumber = trackingNumber,
ShippedAt = DateTime.UtcNow
}),
CorrelationId = orderId,
CreatedAt = DateTime.UtcNow,
Processed = false
};
transactions.Add(_outboxContainer.CreateTransactionalBatchItem(
outboxEvent));
// Execute atomically
var response = await _container.CreateTransactionalBatch(transactions)
.ExecuteAsync();
if (!response.IsSuccessStatusCode)
throw new Exception("Transaction failed");
}
SQL Server Implementation
-- Outbox table schema
CREATE TABLE OutboxMessages (
Id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
EventType NVARCHAR(100) NOT NULL,
TopicName NVARCHAR(100) NOT NULL,
Payload NVARCHAR(MAX) NOT NULL,
CorrelationId NVARCHAR(100),
CreatedAt DATETIME2 NOT NULL DEFAULT GETUTCDATE(),
ProcessedAt DATETIME2,
RetryCount INT DEFAULT 0,
LastError NVARCHAR(MAX),
Processed BIT DEFAULT 0
);
-- Index for efficient processing
CREATE INDEX IX_Outbox_Unprocessed
ON OutboxMessages (CreatedAt)
WHERE Processed = 0;
public class SqlOutboxProcessor
{
private readonly string _connectionString;
private readonly ServiceBusClient _serviceBusClient;
public async Task ProcessOutboxAsync()
{
await using var connection = new SqlConnection(_connectionString);
await connection.OpenAsync();
await using var transaction = await connection.BeginTransactionAsync();
try
{
// Get unprocessed events (with row lock)
var events = await new SqlCommand(@"
SELECT TOP 100 * FROM OutboxMessages
WHERE Processed = 0
ORDER BY CreatedAt
FOR UPDATE SKIP LOCK", connection, (SqlTransaction)transaction)
.ExecuteReaderAsync();
var eventsToProcess = new List<OutboxMessage>();
while (await events.ReadAsync())
{
eventsToProcess.Add(MapToOutboxMessage(events));
}
await events.CloseAsync();
// Publish each event
foreach (var @event in eventsToProcess)
{
await PublishEventAsync(@event);
// Mark as processed
await new SqlCommand(@"
UPDATE OutboxMessages
SET Processed = 1, ProcessedAt = GETUTCDATE()
WHERE Id = @Id", connection, (SqlTransaction)transaction)
{
Parameters = { new("@Id", @event.Id) }
}.ExecuteNonQueryAsync();
}
await transaction.CommitAsync();
}
catch
{
await transaction.RollbackAsync();
throw;
}
}
}
Enterprise Scaling
High-Throughput Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ ENTERPRISE OUTBOX ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Multiple Processors (Scale Out) │
│ ─────────────────────────────── │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Processor │ │ Processor │ │ Processor │ │
│ │ 1 │ │ 2 │ │ 3 │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Partitioned │ │
│ │ Outbox Table │ │
│ │ (by EventType) │ │
│ └─────────────────────┘ │
│ │
│ Distribution: │
│ • Each processor handles specific event types │
│ • No duplicate processing (partitioning) │
│ • Scale independently based on volume │
│ │
└─────────────────────────────────────────────────────────────────────┘
Batch Processing
public class BatchOutboxProcessor
{
private readonly int _batchSize = 100;
private readonly TimeSpan _batchTimeout = TimeSpan.FromSeconds(30);
public async Task ProcessBatchesAsync(CancellationToken ct)
{
var batch = new List<OutboxEvent>();
using var cts = CancellationTokenSource
.CreateLinkedTokenSource(ct);
cts.CancelAfter(_batchTimeout);
while (!ct.IsCancellationRequested)
{
var events = await GetNextBatchAsync(batchSize);
if (events.Count == 0)
{
// Wait for new events
await Task.Delay(TimeSpan.FromSeconds(1), ct);
continue;
}
// Process batch
await ProcessEventBatchAsync(events, ct);
}
}
private async Task ProcessEventBatchAsync(
List<OutboxEvent> events, CancellationToken ct)
{
// Group by topic for efficient sending
var byTopic = events.GroupBy(e => e.TopicName);
foreach (var group in byTopic)
{
var messages = group.Select(e =>
new ServiceBusMessage(e.Payload)
{
MessageId = e.Id
});
var sender = _serviceBusClient.CreateSender(group.Key);
await sender.SendMessagesAsync(messages, ct);
}
// Mark all as processed
await MarkAsProcessedAsync(events.Select(e => e.Id));
}
}
Idempotent Processing
public class IdempotentOutboxProcessor
{
public async Task ProcessWithDeduplicationAsync(OutboxEvent @event)
{
// Check if already processed (idempotency key)
var cacheKey = $"outbox:{@event.Id}";
var isProcessing = await _distributedLock
.AcquireAsync(cacheKey, TimeSpan.FromMinutes(5));
if (!isProcessing)
{
// Already being processed by another node
return;
}
try
{
await PublishEventAsync(@event);
// Mark processed only after successful publish
await MarkAsProcessedAsync(@event.Id);
}
finally
{
await _distributedLock.ReleaseAsync(cacheKey);
}
}
// Also handle at subscriber level
public async Task HandleEventAsync(ServiceBusReceivedMessage message)
{
var idempotencyKey = message.MessageId;
// Check if already handled
var handled = await _redis.GetAsync<bool>(
$"handled:{idempotencyKey}");
if (handled)
{
// Already processed - skip
return;
}
// Process the event
await ProcessEventAsync(message);
// Mark as handled
await _redis.SetAsync(
$"handled:{idempotencyKey}",
true,
TimeSpan.FromDays(7));
}
}
Monitoring and Operations
Key Metrics
{
"outboxMetrics": {
"lag": {
"description": "Time from event creation to processing",
"alert": "If > 5 minutes for 95th percentile"
},
"processingRate": {
"description": "Events processed per second",
"track": "Per event type"
},
"retryRate": {
"description": "Percentage of events requiring retry",
"alert": "If > 5%"
},
"deadLetter": {
"description": "Events moved to DLQ after max retries",
"alert": "Any dead letter needs investigation"
},
"failedEventTypes": {
"description": "Which event types are failing",
"alert": "Track per event type"
}
}
}
Alerting Rules
{
"alertRules": [
{
"name": "OutboxLagHigh",
"condition": "outbox_processing_lag_p95 > 300",
"severity": "Warning",
"action": "Investigate processor throughput"
},
{
"name": "OutboxRetryHigh",
"condition": "outbox_retry_rate > 0.05",
"severity": "Warning",
"action": "Check downstream service health"
},
{
"name": "OutboxDeadLetter",
"condition": "outbox_dead_letter_count > 0",
"severity": "Critical",
"action": "Immediate investigation required"
}
]
}
Best Practices
Implementation Checklist
| Practice | Description |
|---|---|
| Same transaction | Outbox write must be in same transaction as business logic |
| Idempotent consumers | Subscribers must handle duplicate events |
| Process in order | Events for same entity should maintain order |
| Handle failures | Dead letter queue for permanently failed events |
| Monitor lag | Track time from creation to processing |
| Scale horizontally | Partition by event type for high throughput |
Common Pitfalls
┌─────────────────────────────────────────────────────────────────────┐
│ OUTBOX PATTERN PITFALLS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ✗ Processing outbox in separate transaction (breaks atomicity) │
│ ✗ Not handling duplicate events at subscriber │
│ ✗ Publishing before transaction commits │
│ ✗ No retry logic for failed publications │
│ ✗ Missing monitoring for outbox lag │
│ ✗ Processing events for same entity in parallel │
│ │
└─────────────────────────────────────────────────────────────────────┘
Related Topics
- Saga Pattern — Distributed transactions
- Event Sourcing — Event-driven architecture
- Competing Consumers — Message processing
Azure Integration Hub - Architect Level Enterprise Integration Patterns