Transactional Outbox at Enterprise Scale

Reliable Event Publishing for Distributed Systems


Introduction

The Transactional Outbox pattern is a critical pattern for ensuring reliable event-driven communication in distributed systems. When you need to both update a database and publish an event (like notifying other services of that change), doing so in a single transaction isn't possible across different systems. The outbox pattern solves this by writing events to a database "outbox" table as part of the same transaction as the business data, then having a separate process publish those events reliably.

This comprehensive guide covers:

  • Pattern fundamentals — Understanding outbox pattern
  • Implementation approaches — Database-driven vs. log-based
  • Azure implementation — Cosmos DB, SQL, Event Hub
  • Scaling considerations — Enterprise-level performance
  • Monitoring and recovery — Ensuring reliability

Understanding the Pattern

The Problem It Solves

┌─────────────────────────────────────────────────────────────────────┐
│                  THE TRANSACTION PROBLEM                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   Naive Approach (Two Separate Operations):                         │
│                                                                     │
│   ┌─────────────────┐         ┌─────────────────┐                   │
│   │  Update         │         │  Publish        │                   │
│   │  Database       │         │  Event          │                   │
│   └────────┬────────┘         └────────┬────────┘                   │
│            │                           │                            │
│            ▼                           ▼                            │
│      ┌─────────────┐             ┌─────────────┐                    │
│      │   Success   │             │   Failed    │                    │
│      │             │             │             │                    │
│      │   Order     │             │   No event  │                    │
│      │   updated   │             │   published │                    │
│      └─────────────┘             └─────────────┘                    │
│                                                                     │
│   Problem: INCONSISTENCY - Database updated, but event never        │
│   published. Other systems don't know about the change.             │
│                                                                     │
│   ───────────────────────────────────────────────────────────────── │
│                                                                     │
│   Transactional Outbox Solution:                                    │
│                                                                     │
│   ┌────────────────────────────┐                                    │
│   │      SINGLE TRANSACTION    │                                    │
│   │  ┌──────────┐ ┌───────────┐│                                    │
│   │  │ Update   │ │ Write to  ││                                    │
│   │  │ Order    │ │ Outbox    ││                                    │
│   │  │ Table    │ │ Table     ││                                    │
│   │  └──────────┘ └───────────┘│                                    │
│   └────────────────────────────┘                                    │
│            │                                                        │
│            ▼                                                        │
│      ┌─────────────┐                                                │
│      | Both: Order │                                                │
│      |updated AND  │                                                │
│      |event written│                                                │
│      └─────────────┘                                                │
│                                                                     │
│   ✓ Atomicity: Both succeed or both fail                            │
│   ✓ Reliability: Events can be processed later if publisher fails   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

How the Pattern Works

┌─────────────────────────────────────────────────────────────────────┐
│                  OUTBOX PATTERN ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌────────────────────────────────────────────────────────────┐    │
│   │                    APPLICATION                             │    │
│   │                                                            │    │
│   │   ┌──────────────────────────────────────────────────────┐ │    │
│   │   │              DATABASE (ACID)                         │ │    │
│   │   │                                                      │ │    │
│   │   │   ┌────────────────┐    ┌────────────────────────┐   │ │    │
│   │   │   │  Orders Table  │    │   Outbox Table         │   │ │    │
│   │   │   │                │    │                        │   │ │    │
│   │   │   │ id: ORD-123    │    │ id: uuid               │   │ │    │
│   │   │   │ status: Shipped│    │ eventType: OrderShipped│   │ │    │
│   │   │   │ ...            │    │ payload: {...}         │   │ │    │
│   │   │   └────────────────┘    │ processed: false       │   │ │    │
│   │   │                         │ createdAt: ...         │   │ │    │
│   │   │                         └────────────────────────┘   │ │    │
│   │   │                                                      │ │    │
│   │   └──────────────────────────────────────────────────────┘ │    │
│   │                      │                                     │    │
│   │           Single Transaction                               │    │
│   │                      │                                     │    │
│   └──────────────────────┼─────────────────────────────────────┘    │
│                          │                                          │
│   ┌──────────────────────┼──────────────────────────────────────┐   │
│   │                      ▼                                      │   │
│   │   ┌───────────────────────────────────────────────────────┐ │   │
│   │   │              OUTBOX PROCESSOR                         │ │   │
│   │   │                                                       │ │   │
│   │   │   1. Read unprocessed events                          │ │   │
│   │   │   2. Publish to message broker                        │ │   │
│   │   │   3. Mark as processed                                │ │   │
│   │   │   4. Repeat                                           │ │   │
│   │   │                                                       │ │   │
│   │   └───────────────────────────────────────────────────────┘ │   │
│   │                      │                                      │   │
│   │           Published Events                                  │   │
│   │                      ▼                                      │   │
│   │   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │   │
│   │   │  Subscriber  │  │  Subscriber  │  │  Subscriber  │      │   │
│   │   │  A           │  │  B           │  │  C           │      │   │
│   │   └──────────────┘  └──────────────┘  └──────────────┘      │   │
│   │                                                             │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Azure Implementation

Cosmos DB Implementation

public class CosmosOutboxProcessor
{
    private readonly Container _outboxContainer;
    private readonly Container _ordersContainer;
    private readonly ServiceBusClient _serviceBusClient;
    private readonly ILogger<CosmosOutboxProcessor> _logger;

    public async Task ProcessOutboxAsync(CancellationToken ct)
    {
        var query = new QueryDefinition(
            "SELECT * FROM c WHERE c.processed = false ORDER BY c.createdAt");

        var events = new List<OutboxEvent>();
        
        await foreach (var item in _outboxContainer
            .GetItemQueryIterator<OutboxEvent>(query))
        {
            events.Add(item);
        }

        foreach (var @event in events)
        {
            try
            {
                // Publish to Service Bus
                var sender = _serviceBusClient.CreateSender(@event.TopicName);
                
                var message = new ServiceBusMessage(@event.Payload)
                {
                    ContentType = "application/json",
                    MessageId = @event.Id,
                    CorrelationId = @event.CorrelationId,
                    Subject = @event.EventType
                };

                await sender.SendMessageAsync(message, ct);

                // Mark as processed
                @event.ProcessedAt = DateTime.UtcNow;
                @event.Processed = true;
                
                await _outboxContainer.ReplaceItemAsync(
                    @event.Id, @event, new ItemRequestOptions(), ct);

                _logger.LogInformation(
                    "Published event {EventId} of type {EventType}",
                    @event.Id, @event.EventType);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, 
                    "Failed to process event {EventId}", @event.Id);
                
                @event.RetryCount++;
                @event.LastError = ex.Message;
                
                await _outboxContainer.ReplaceItemAsync(
                    @event.Id, @event, new ItemRequestOptions(), ct);
            }
        }
    }
}

// Writing to outbox (same transaction as business data)
public async Task ShipOrderAsync(string orderId, string trackingNumber)
{
    var transactions = new List<TransactionalBatchOperation>();

    // Update order
    var orderUpdate = new { Status = "Shipped", TrackingNumber = trackingNumber };
    transactions.Add(_ordersContainer.CreateTransactionalBatchItem(
        orderId,
        JsonSerializer.Serialize(orderUpdate),
        new ItemRequestOptions { EnableContentResponseOnWrite = false }));

    // Write to outbox
    var outboxEvent = new OutboxEvent
    {
        Id = Guid.NewGuid().ToString(),
        EventType = "OrderShipped",
        TopicName = "order-events",
        Payload = JsonSerializer.Serialize(new 
        {
            OrderId = orderId,
            TrackingNumber = trackingNumber,
            ShippedAt = DateTime.UtcNow
        }),
        CorrelationId = orderId,
        CreatedAt = DateTime.UtcNow,
        Processed = false
    };
    
    transactions.Add(_outboxContainer.CreateTransactionalBatchItem(
        outboxEvent));

    // Execute atomically
    var response = await _container.CreateTransactionalBatch(transactions)
        .ExecuteAsync();

    if (!response.IsSuccessStatusCode)
        throw new Exception("Transaction failed");
}

SQL Server Implementation

-- Outbox table schema
CREATE TABLE OutboxMessages (
    Id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
    EventType NVARCHAR(100) NOT NULL,
    TopicName NVARCHAR(100) NOT NULL,
    Payload NVARCHAR(MAX) NOT NULL,
    CorrelationId NVARCHAR(100),
    CreatedAt DATETIME2 NOT NULL DEFAULT GETUTCDATE(),
    ProcessedAt DATETIME2,
    RetryCount INT DEFAULT 0,
    LastError NVARCHAR(MAX),
    Processed BIT DEFAULT 0
);

-- Index for efficient processing
CREATE INDEX IX_Outbox_Unprocessed 
    ON OutboxMessages (CreatedAt) 
    WHERE Processed = 0;
public class SqlOutboxProcessor
{
    private readonly string _connectionString;
    private readonly ServiceBusClient _serviceBusClient;

    public async Task ProcessOutboxAsync()
    {
        await using var connection = new SqlConnection(_connectionString);
        await connection.OpenAsync();

        await using var transaction = await connection.BeginTransactionAsync();

        try
        {
            // Get unprocessed events (with row lock)
            var events = await new SqlCommand(@"
                SELECT TOP 100 * FROM OutboxMessages 
                WHERE Processed = 0 
                ORDER BY CreatedAt
                FOR UPDATE SKIP LOCK", connection, (SqlTransaction)transaction)
                .ExecuteReaderAsync();

            var eventsToProcess = new List<OutboxMessage>();
            while (await events.ReadAsync())
            {
                eventsToProcess.Add(MapToOutboxMessage(events));
            }
            await events.CloseAsync();

            // Publish each event
            foreach (var @event in eventsToProcess)
            {
                await PublishEventAsync(@event);

                // Mark as processed
                await new SqlCommand(@"
                    UPDATE OutboxMessages 
                    SET Processed = 1, ProcessedAt = GETUTCDATE()
                    WHERE Id = @Id", connection, (SqlTransaction)transaction)
                {
                    Parameters = { new("@Id", @event.Id) }
                }.ExecuteNonQueryAsync();
            }

            await transaction.CommitAsync();
        }
        catch
        {
            await transaction.RollbackAsync();
            throw;
        }
    }
}

Enterprise Scaling

High-Throughput Architecture

┌─────────────────────────────────────────────────────────────────────┐
│              ENTERPRISE OUTBOX ARCHITECTURE                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   Multiple Processors (Scale Out)                                   │
│   ───────────────────────────────                                   │
│                                                                     │
│   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐                   │
│   │  Processor  │ │  Processor  │ │  Processor  │                   │
│   │      1      │ │      2      │ │      3      │                   │
│   └──────┬──────┘ └──────┬──────┘ └──────┬──────┘                   │
│          │               │               │                          │
│          └───────────────┼───────────────┘                          │
│                          ▼                                          │
│              ┌─────────────────────┐                                │
│              │   Partitioned       │                                │
│              │   Outbox Table      │                                │
│              │   (by EventType)    │                                │
│              └─────────────────────┘                                │
│                                                                     │
│   Distribution:                                                     │
│   • Each processor handles specific event types                     │
│   • No duplicate processing (partitioning)                          │
│   • Scale independently based on volume                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Batch Processing

public class BatchOutboxProcessor
{
    private readonly int _batchSize = 100;
    private readonly TimeSpan _batchTimeout = TimeSpan.FromSeconds(30);

    public async Task ProcessBatchesAsync(CancellationToken ct)
    {
        var batch = new List<OutboxEvent>();
        
        using var cts = CancellationTokenSource
            .CreateLinkedTokenSource(ct);
        
        cts.CancelAfter(_batchTimeout);

        while (!ct.IsCancellationRequested)
        {
            var events = await GetNextBatchAsync(batchSize);
            
            if (events.Count == 0)
            {
                // Wait for new events
                await Task.Delay(TimeSpan.FromSeconds(1), ct);
                continue;
            }

            // Process batch
            await ProcessEventBatchAsync(events, ct);
        }
    }

    private async Task ProcessEventBatchAsync(
        List<OutboxEvent> events, CancellationToken ct)
    {
        // Group by topic for efficient sending
        var byTopic = events.GroupBy(e => e.TopicName);

        foreach (var group in byTopic)
        {
            var messages = group.Select(e => 
                new ServiceBusMessage(e.Payload) 
                { 
                    MessageId = e.Id 
                });

            var sender = _serviceBusClient.CreateSender(group.Key);
            
            await sender.SendMessagesAsync(messages, ct);
        }

        // Mark all as processed
        await MarkAsProcessedAsync(events.Select(e => e.Id));
    }
}

Idempotent Processing

public class IdempotentOutboxProcessor
{
    public async Task ProcessWithDeduplicationAsync(OutboxEvent @event)
    {
        // Check if already processed (idempotency key)
        var cacheKey = $"outbox:{@event.Id}";
        
        var isProcessing = await _distributedLock
            .AcquireAsync(cacheKey, TimeSpan.FromMinutes(5));

        if (!isProcessing)
        {
            // Already being processed by another node
            return;
        }

        try
        {
            await PublishEventAsync(@event);
            
            // Mark processed only after successful publish
            await MarkAsProcessedAsync(@event.Id);
        }
        finally
        {
            await _distributedLock.ReleaseAsync(cacheKey);
        }
    }

    // Also handle at subscriber level
    public async Task HandleEventAsync(ServiceBusReceivedMessage message)
    {
        var idempotencyKey = message.MessageId;

        // Check if already handled
        var handled = await _redis.GetAsync<bool>(
            $"handled:{idempotencyKey}");

        if (handled)
        {
            // Already processed - skip
            return;
        }

        // Process the event
        await ProcessEventAsync(message);

        // Mark as handled
        await _redis.SetAsync(
            $"handled:{idempotencyKey}", 
            true, 
            TimeSpan.FromDays(7));
    }
}

Monitoring and Operations

Key Metrics

{
  "outboxMetrics": {
    "lag": {
      "description": "Time from event creation to processing",
      "alert": "If > 5 minutes for 95th percentile"
    },
    "processingRate": {
      "description": "Events processed per second",
      "track": "Per event type"
    },
    "retryRate": {
      "description": "Percentage of events requiring retry",
      "alert": "If > 5%"
    },
    "deadLetter": {
      "description": "Events moved to DLQ after max retries",
      "alert": "Any dead letter needs investigation"
    },
    "failedEventTypes": {
      "description": "Which event types are failing",
      "alert": "Track per event type"
    }
  }
}

Alerting Rules

{
  "alertRules": [
    {
      "name": "OutboxLagHigh",
      "condition": "outbox_processing_lag_p95 > 300",
      "severity": "Warning",
      "action": "Investigate processor throughput"
    },
    {
      "name": "OutboxRetryHigh",
      "condition": "outbox_retry_rate > 0.05",
      "severity": "Warning",
      "action": "Check downstream service health"
    },
    {
      "name": "OutboxDeadLetter",
      "condition": "outbox_dead_letter_count > 0",
      "severity": "Critical",
      "action": "Immediate investigation required"
    }
  ]
}

Best Practices

Implementation Checklist

PracticeDescription
Same transactionOutbox write must be in same transaction as business logic
Idempotent consumersSubscribers must handle duplicate events
Process in orderEvents for same entity should maintain order
Handle failuresDead letter queue for permanently failed events
Monitor lagTrack time from creation to processing
Scale horizontallyPartition by event type for high throughput

Common Pitfalls

┌─────────────────────────────────────────────────────────────────────┐
│                  OUTBOX PATTERN PITFALLS                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ✗ Processing outbox in separate transaction (breaks atomicity)    │
│   ✗ Not handling duplicate events at subscriber                     │
│   ✗ Publishing before transaction commits                           │
│   ✗ No retry logic for failed publications                          │
│   ✗ Missing monitoring for outbox lag                               │
│   ✗ Processing events for same entity in parallel                   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Related Topics


Azure Integration Hub - Architect Level Enterprise Integration Patterns