← Back to ArticlesEvent Hubs

Event Hubs — Consumer Groups, Partition Balancing, and Scaling

Implementing scalable consumer groups, partition management, and proper scaling strategies for Azure Event Hubs.

Why Consumer Groups and Partitions Matter

Azure Event Hubs is a streaming platform, not a message queue. Understanding this distinction is the foundation for everything else in this article.

The Streaming Model vs. Traditional Queues

In a traditional queue (like Azure Service Bus queues), a message is consumed once and then removed. If three services need the same message, you need topics with three subscriptions. The broker tracks delivery state, handles acknowledgments, and removes messages after processing.

Event Hubs works differently. Events are appended to a log and retained for a configurable period (1–90 days, or indefinitely with Event Hubs Dedicated). Consumers read from the log at their own pace using an offset — a pointer to their position in the stream. The broker does not track who has read what; consumers manage their own position.

This means:

Why Partitions Exist

A single log would become a bottleneck. Partitions solve this by splitting the event stream into multiple parallel logs:

When a producer sends an event with a partition key (e.g., a device ID or tenant ID), Event Hubs hashes the key and routes all events with that key to the same partition. This guarantees ordering for related events without requiring a single global order.

Why Consumer Groups Exist

A consumer group represents a logical subscriber to the event hub. Each consumer group maintains its own independent read position across all partitions.

Without consumer groups, you'd have a single reader position. If your real-time alerting system and your analytics pipeline both need the same events, they'd fight over the offset. Consumer groups solve this — each gets its own view of the stream.

Think of it like a DVR recording: multiple people can watch the same show at different points, fast-forward, or rewind independently.


Architecture Overview

┌───────────────────────────────────────────────────────────────────────────┐
│                              PRODUCERS                                    │
│                                                                           │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐            │
│   │ IoT Dev  │    │ Web App  │    │ Service A│    │ Service B│            │
│   │ key:"d1" │    │ key:"u5" │    │ key:"d2" │    │ key:"u5" │            │
│   └────┬─────┘    └────┬─────┘    └────┬─────┘    └────┬─────┘            │
│        │               │               │               │                  │
└────────┼───────────────┼───────────────┼───────────────┼──────────────────┘
         │               │               │               │
         ▼               ▼               ▼               ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                         EVENT HUB: "telemetry-hub"                       │
│                                                                          │
│   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐        │
│   │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │ Partition 3 │        │
│   │             │ │             │ │             │ │             │        │
│   │ [e1][e4][e7]│ │ [e2][e5]    │ │ [e3][e6][e8]│ │ [e9][e10]   │        │
│   │  offset: 3  │ │  offset: 2  │ │  offset: 3  │ │  offset: 2  │        │
│   └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘        │
│          │               │               │               │               │
└──────────┼───────────────┼───────────────┼───────────────┼───────────────┘
           │               │               │               │
     ┌─────┴───────────────┴───────────────┴───────────────┴─────┐
     │                                                           │
     ▼                                                           ▼
┌──────────────────────────────────────┐  ┌──────────────────────────────────┐
│  CONSUMER GROUP A: "realtime-alerts" │  │ CONSUMER GROUP B: "analytics"    │
│  (3 instances)                       │  │ (2 instances)                    │
│                                      │  │                                  │
│  ┌──────────┐  Owns: P0, P1          │  │  ┌──────────┐  Owns: P0, P1      │
│  │Instance 1│  Offset P0: 2          │  │  │Instance 1│  Offset P0: 1      │
│  │          │  Offset P1: 2          │  │  │          │  Offset P1: 2      │
│  └──────────┘                        │  │  └──────────┘                    │
│                                      │  │                                  │
│  ┌──────────┐  Owns: P2              │  │  ┌──────────┐  Owns: P2, P3      │
│  │Instance 2│  Offset P2: 1          │  │  │Instance 2│  Offset P2: 3      │
│  │          │  Offset P2: 1          │  │  │          │  Offset P3: 2      │
│  └──────────┘                        │  │  └──────────┘                    │
│                                      │  │                                  │
│  ┌──────────┐  Owns: P3              │  └──────────────────────────────────┘
│  │Instance 3│  Offset P3: 0          │
│  │          │  (behind — lag!)       │
│  └──────────┘                        │
│                                      │
└──────────────────────────────────────┘

Key observations:
• Each consumer group tracks offsets independently
• Consumer Group A is at different offsets than Consumer Group B
• Within a group, each partition is owned by exactly ONE instance
• Instance 3 in Group A has lag (offset 0 vs latest 2) — it's behind
• Group B's Instance 2 is fully caught up on P2 (offset 3 = latest)

How Consumer Groups Work

Independent Read Positions

Each consumer group maintains its own checkpoint (offset) for every partition. This means:

None of these interfere with each other. The events remain in the partition until the retention period expires, regardless of how many consumer groups have read them.

The $Default Consumer Group

Every Event Hub comes with a built-in consumer group called $Default. It's fine for development and simple scenarios, but in production you should create dedicated consumer groups for each logical subscriber. This provides:

Partition Distribution Within a Consumer Group

Within a single consumer group, partitions are distributed among consumer instances:

This is the fundamental scaling constraint: the number of partitions sets the maximum parallelism within a consumer group.

Consumer Group Limits

TierMax Consumer Groups per Event Hub
Basic1 ($Default only)
Standard20
Premium100
Dedicated1000

Step-by-Step Implementation

Creating Consumer Groups

Using Azure CLI:

# Create a consumer group
az eventhubs eventhub consumer-group create \
  --resource-group myResourceGroup \
  --namespace-name myNamespace \
  --eventhub-name telemetry-hub \
  --name realtime-alerts

# Create another consumer group for analytics
az eventhubs eventhub consumer-group create \
  --resource-group myResourceGroup \
  --namespace-name myNamespace \
  --eventhub-name telemetry-hub \
  --name analytics-pipeline

# List consumer groups
az eventhubs eventhub consumer-group list \
  --resource-group myResourceGroup \
  --namespace-name myNamespace \
  --eventhub-name telemetry-hub \
  --output table

EventProcessorClient Setup (C#)

The EventProcessorClient is the recommended way to consume events. It handles partition balancing, checkpointing, and failover automatically.

using Azure.Messaging.EventHubs;
using Azure.Messaging.EventHubs.Consumer;
using Azure.Messaging.EventHubs.Processor;
using Azure.Storage.Blobs;

public class EventHubConsumerService : IHostedService
{
    private readonly EventProcessorClient _processor;
    private readonly ILogger<EventHubConsumerService> _logger;

    public EventHubConsumerService(IConfiguration config, ILogger<EventHubConsumerService> logger)
    {
        _logger = logger;

        // Blob container stores checkpoint data (offsets per partition)
        var blobClient = new BlobContainerClient(
            config["Storage:ConnectionString"],
            "eventhub-checkpoints");

        _processor = new EventProcessorClient(
            blobClient,
            "realtime-alerts",  // consumer group name
            config["EventHub:ConnectionString"],
            "telemetry-hub");   // event hub name

        // Register handlers
        _processor.ProcessEventAsync += ProcessEventAsync;
        _processor.ProcessErrorAsync += ProcessErrorAsync;
        _processor.PartitionInitializingAsync += PartitionInitializingAsync;
        _processor.PartitionClosingAsync += PartitionClosingAsync;
    }

    public async Task StartAsync(CancellationToken cancellationToken)
    {
        await _processor.StartProcessingAsync(cancellationToken);
        _logger.LogInformation("Event processor started");
    }

    public async Task StopAsync(CancellationToken cancellationToken)
    {
        await _processor.StopProcessingAsync(cancellationToken);
        _logger.LogInformation("Event processor stopped");
    }
}

Processing Events with Checkpointing

private async Task ProcessEventAsync(ProcessEventArgs args)
{
    if (args.CancellationToken.IsCancellationRequested)
        return;

    try
    {
        // Deserialize the event
        var eventBody = args.Data.EventBody.ToString();
        var telemetry = JsonSerializer.Deserialize<TelemetryEvent>(eventBody);

        // Process the event (your business logic)
        await ProcessTelemetryAsync(telemetry);

        // Checkpoint every 10 events to reduce storage calls
        if (args.Data.SequenceNumber % 10 == 0)
        {
            await args.UpdateCheckpointAsync();
            _logger.LogDebug(
                "Checkpointed partition {PartitionId} at sequence {Sequence}",
                args.Partition.PartitionId,
                args.Data.SequenceNumber);
        }
    }
    catch (Exception ex)
    {
        // Don't checkpoint on failure — event will be reprocessed
        _logger.LogError(ex,
            "Error processing event from partition {PartitionId}, sequence {Sequence}",
            args.Partition.PartitionId,
            args.Data.SequenceNumber);
    }
}

Handling Partition Lifecycle Events

private Task PartitionInitializingAsync(PartitionInitializingEventArgs args)
{
    _logger.LogInformation(
        "Partition {PartitionId} initializing. Default position: {Position}",
        args.PartitionId,
        args.DefaultStartingPosition);

    // Optionally override the starting position if no checkpoint exists
    if (args.DefaultStartingPosition == default)
    {
        args.DefaultStartingPosition = EventPosition.Latest;
        // Use EventPosition.Earliest to process all retained events
        // Use EventPosition.Latest to only process new events
    }

    return Task.CompletedTask;
}

private Task PartitionClosingAsync(PartitionClosingEventArgs args)
{
    _logger.LogInformation(
        "Partition {PartitionId} closing. Reason: {Reason}",
        args.PartitionId,
        args.Reason);

    // Reason can be: OwnershipLost, Shutdown, or other
    // Use this to flush buffers or release resources tied to a partition

    return Task.CompletedTask;
}

Error Handling

private Task ProcessErrorAsync(ProcessErrorEventArgs args)
{
    _logger.LogError(args.Exception,
        "Error in partition {PartitionId}, operation: {Operation}",
        args.PartitionId,
        args.Operation);

    // Common errors:
    // - EventHubsException with Reason == ResourceNotFound: hub doesn't exist
    // - EventHubsException with Reason == ConsumerGroupNotFound: group doesn't exist
    // - OperationCanceledException: processor is shutting down (safe to ignore)

    // The processor will automatically retry transient errors.
    // Only unrecoverable errors need manual intervention.

    return Task.CompletedTask;
}

Partition Balancing

How EventProcessorClient Distributes Partitions

The EventProcessorClient uses a cooperative balancing algorithm:

  1. Each processor instance periodically checks the ownership table (stored in Blob Storage).
  2. It counts total partitions and total active processors.
  3. It calculates the ideal distribution: partitions / processors (rounded up for some).
  4. If it owns fewer partitions than its fair share, it claims unclaimed partitions.
  5. If all partitions are claimed but distribution is uneven, it may steal a partition.

The ownership table in Blob Storage looks like this:

Container: eventhub-checkpoints
├── ownership/
│   ├── 0    → { "ownerId": "instance-A", "lastModified": "..." }
│   ├── 1    → { "ownerId": "instance-A", "lastModified": "..." }
│   ├── 2    → { "ownerId": "instance-B", "lastModified": "..." }
│   └── 3    → { "ownerId": "instance-C", "lastModified": "..." }
└── checkpoint/
    ├── 0    → { "offset": "1024", "sequenceNumber": 42 }
    ├── 1    → { "offset": "2048", "sequenceNumber": 87 }
    ├── 2    → { "offset": "512",  "sequenceNumber": 21 }
    └── 3    → { "offset": "768",  "sequenceNumber": 35 }

Rebalancing: When Consumers Join or Leave

New consumer joins (scale-out):

Before (2 consumers, 4 partitions):
  Instance A: P0, P1, P2, P3  ← overloaded
  Instance B: (just started)

Balancing cycle 1:
  Instance A: P0, P1, P2      ← releases P3
  Instance B: P3               ← claims P3

Balancing cycle 2:
  Instance A: P0, P1           ← releases P2
  Instance B: P2, P3           ← claims P2

Final state (balanced):
  Instance A: P0, P1
  Instance B: P2, P3

Consumer leaves (scale-in or crash):

Before (3 consumers, 4 partitions):
  Instance A: P0, P1
  Instance B: P2               ← crashes!
  Instance C: P3

After (ownership expires after ~30s):
  Instance A: P0, P1, P2      ← claims orphaned P2
  Instance C: P3

Next balancing cycle:
  Instance A: P0, P1           ← releases P2
  Instance C: P2, P3           ← claims P2

Partition Ownership and Lease Management

Ownership is maintained through blob leases with ETags for optimistic concurrency:

Handling Partition Steal Scenarios

When a partition is "stolen" (reassigned during rebalancing), your processor receives a PartitionClosingAsync event with Reason = OwnershipLost. Handle this gracefully:

private Task PartitionClosingAsync(PartitionClosingEventArgs args)
{
    if (args.Reason == ProcessingStoppedReason.OwnershipLost)
    {
        _logger.LogWarning(
            "Partition {PartitionId} was reassigned to another instance",
            args.PartitionId);

        // Flush any buffered data for this partition
        // Cancel any in-progress work for this partition
        // Do NOT checkpoint — the new owner will resume from last checkpoint
    }

    return Task.CompletedTask;
}

Load Balancing Strategies

The EventProcessorClient supports configuring the load balancing approach:

var options = new EventProcessorClientOptions
{
    // How often the processor checks for rebalancing
    LoadBalancingUpdateInterval = TimeSpan.FromSeconds(10),

    // How long before an inactive ownership is considered expired
    PartitionOwnershipExpirationInterval = TimeSpan.FromSeconds(30),

    // Balancing strategy
    LoadBalancingStrategy = LoadBalancingStrategy.Balanced
    // Balanced: gradually converge to even distribution (default)
    // Greedy: claim all available partitions immediately
};

Balanced strategy (default):

Greedy strategy:

ScenarioRecommended Strategy
Stable production workloadBalanced
Frequent auto-scalingGreedy
Crash recovery is criticalGreedy
Minimizing reprocessingBalanced
Development/testingGreedy

Scaling Strategies

Scaling Consumers: The Partition Ceiling

The fundamental rule: maximum useful consumers in a consumer group = number of partitions.

4 partitions, 2 consumers  → each consumer handles 2 partitions ✓
4 partitions, 4 consumers  → each consumer handles 1 partition  ✓ (max parallelism)
4 partitions, 6 consumers  → 4 active + 2 idle (wasted)         ✗

Plan your partition count based on your peak parallelism needs, not your current load.

When to Add Partitions

⚠️ Partitions can be added but never removed. This is a one-way operation.

Add partitions when:

# Increase partition count (cannot decrease!)
az eventhubs eventhub update \
  --resource-group myResourceGroup \
  --namespace-name myNamespace \
  --name telemetry-hub \
  --partition-count 8

# Check current partition count
az eventhubs eventhub show \
  --resource-group myResourceGroup \
  --namespace-name myNamespace \
  --name telemetry-hub \
  --query partitionCount

Impact of adding partitions:

Throughput Units and Partitions

ResourcePer Throughput Unit (TU)Per Partition
Ingress1 MB/s or 1000 events/s1 MB/s
Egress2 MB/s2 MB/s
Max partitions

Relationship: You need enough TUs to cover your aggregate throughput AND enough partitions to distribute the load. Rule of thumb:

Minimum TUs needed = max(total ingress MB/s, total egress MB/s ÷ 2)
Minimum partitions = max(desired consumer parallelism, TUs needed)

Auto-Inflate for Handling Spikes

Auto-inflate automatically scales throughput units up (but not down) when load increases:

# Enable auto-inflate with max 20 TUs
az eventhubs namespace update \
  --resource-group myResourceGroup \
  --name myNamespace \
  --enable-auto-inflate true \
  --maximum-throughput-units 20

Scaling Producers with Partition Keys

Choose partition keys carefully to distribute load evenly:

// Good: high-cardinality key distributes evenly
await producerClient.SendAsync(new[]
{
    new EventData(payload)
}, new SendEventOptions { PartitionKey = deviceId });

// Bad: low-cardinality key creates hot partitions
await producerClient.SendAsync(new[]
{
    new EventData(payload)
}, new SendEventOptions { PartitionKey = region }); // only 4 regions = uneven!

// Alternative: round-robin (no partition key) for max throughput
await producerClient.SendAsync(new[] { new EventData(payload) });
// Events distributed evenly but no ordering guarantee

Checkpointing Strategies

Checkpoint Frequency Trade-offs

Checkpointing writes the current offset to Blob Storage. The frequency creates a trade-off:

StrategyProsCons
Every eventMinimal reprocessing on crashHigh storage I/O, slower
Every N events (e.g., 10)BalancedUp to N events reprocessed
Every N secondsPredictable I/O costVariable reprocessing window
On batch completionAtomic batch semanticsEntire batch reprocessed on fail

Recommended approach — checkpoint every N events OR every N seconds, whichever comes first:

private int _eventsSinceCheckpoint = 0;
private DateTime _lastCheckpoint = DateTime.UtcNow;

private async Task ProcessEventAsync(ProcessEventArgs args)
{
    await ProcessBusinessLogicAsync(args.Data);

    _eventsSinceCheckpoint++;

    bool shouldCheckpoint =
        _eventsSinceCheckpoint >= 100 ||
        (DateTime.UtcNow - _lastCheckpoint) > TimeSpan.FromSeconds(30);

    if (shouldCheckpoint)
    {
        await args.UpdateCheckpointAsync();
        _eventsSinceCheckpoint = 0;
        _lastCheckpoint = DateTime.UtcNow;
    }
}

Exactly-Once vs. At-Least-Once

Event Hubs provides at-least-once delivery by default. After a crash, events between the last checkpoint and the crash point will be redelivered. True exactly-once requires your processing to be idempotent.

Idempotent processing pattern:

private async Task ProcessEventAsync(ProcessEventArgs args)
{
    var eventId = args.Data.MessageId
        ?? $"{args.Partition.PartitionId}-{args.Data.SequenceNumber}";

    // Check if already processed (using a deduplication store)
    if (await _deduplicationStore.HasBeenProcessedAsync(eventId))
    {
        _logger.LogDebug("Skipping duplicate event {EventId}", eventId);
        return;
    }

    // Process the event
    await ProcessBusinessLogicAsync(args.Data);

    // Mark as processed (ideally in the same transaction as your business logic)
    await _deduplicationStore.MarkProcessedAsync(eventId);

    await args.UpdateCheckpointAsync();
}

Transactional outbox pattern for database writes:

// Process event and record dedup key in a single database transaction
using var transaction = await _db.BeginTransactionAsync();

await _db.InsertTelemetryAsync(telemetry, transaction);
await _db.InsertProcessedEventAsync(eventId, transaction);

await transaction.CommitAsync();
await args.UpdateCheckpointAsync();

Real-World Scenarios

Scenario 1: IoT Telemetry Platform

Devices (100K+) → Event Hub (32 partitions)
                    ├── Consumer Group: "realtime-alerts" (8 instances)
                    │     └── Checks thresholds, triggers alerts within seconds
                    ├── Consumer Group: "analytics" (4 instances)
                    │     └── Aggregates to Azure Data Explorer every 5 minutes
                    ├── Consumer Group: "cold-storage" (2 instances)
                    │     └── Archives raw events to ADLS Gen2
                    └── Consumer Group: "device-state" (8 instances)
                          └── Maintains latest-known-state per device in Redis

Scenario 2: Click-Stream Processing

Web/Mobile Apps → Event Hub (16 partitions)
                    ├── Consumer Group: "realtime-dashboard" (4 instances)
                    │     └── Updates live user count, trending pages
                    ├── Consumer Group: "personalization" (8 instances)
                    │     └── Feeds ML model for real-time recommendations
                    └── Consumer Group: "batch-etl" (2 instances)
                          └── Loads to data warehouse every hour

Scenario 3: Log Aggregation

Microservices (50+) → Event Hub (8 partitions)
                        ├── Consumer Group: "search-index" (4 instances)
                        │     └── Indexes to Elasticsearch for search
                        ├── Consumer Group: "anomaly-detection" (4 instances)
                        │     └── ML-based anomaly detection pipeline
                        └── Consumer Group: "compliance" (2 instances)
                              └── Filters and archives PII-containing logs

Consumer Lag Monitoring

Consumer lag is the difference between the latest event in a partition and the consumer's current checkpoint. Growing lag means your consumer can't keep up.

Detecting Lag Programmatically

public async Task<Dictionary<string, long>> GetConsumerLagAsync(
    string connectionString,
    string eventHubName,
    string consumerGroup,
    BlobContainerClient checkpointContainer)
{
    var lag = new Dictionary<string, long>();

    await using var consumer = new EventHubConsumerClient(
        consumerGroup, connectionString, eventHubName);

    var partitionIds = await consumer.GetPartitionIdsAsync();

    foreach (var partitionId in partitionIds)
    {
        var properties = await consumer.GetPartitionPropertiesAsync(partitionId);
        var lastEnqueued = properties.LastEnqueuedSequenceNumber;

        // Read checkpoint from blob storage
        var checkpointBlob = checkpointContainer.GetBlobClient(
            $"{eventHubName}/{consumerGroup}/checkpoint/{partitionId}");

        long consumerSequence = 0;
        if (await checkpointBlob.ExistsAsync())
        {
            var props = await checkpointBlob.GetPropertiesAsync();
            if (props.Value.Metadata.TryGetValue("sequenceNumber", out var seq))
                consumerSequence = long.Parse(seq);
        }

        lag[partitionId] = lastEnqueued - consumerSequence;
    }

    return lag;
}

Lag Alert Thresholds

Lag LevelEvents BehindAction
Healthy< 100No action needed
Warning100–1,000Monitor — may need more consumers
Critical1,000–10,000Scale out consumers immediately
Emergency> 10,000Risk of data loss (retention expiry)

Best Practices

AreaRecommendation
Partition countStart with 4–8; scale to 32 for high-throughput workloads
Consumer groupsOne per logical subscriber; never share $Default in production
Partition keysUse high-cardinality keys (device ID, user ID) for even spread
CheckpointingEvery 100 events or 30 seconds, whichever comes first
Error handlingNever let exceptions escape ProcessEventAsync
IdempotencyDesign all consumers to handle duplicate events
ScalingSet partition count to your peak parallelism need
MonitoringAlert on consumer lag > 1000 events
Blob storageUse a dedicated container per consumer group for checkpoints
Graceful shutdownCall StopProcessingAsync to checkpoint before exit

Common Pitfalls

1. More consumers than partitions Adding a 5th consumer to a 4-partition hub wastes resources. The extra consumer sits idle.

2. Checkpointing inside a try/catch that swallows errors If you checkpoint after a failed processing attempt, you'll skip events permanently.

3. Using $Default for multiple applications Two apps sharing $Default will fight over partition ownership. Create separate consumer groups.

4. Low-cardinality partition keys Using "region" (4 values) as a partition key with 32 partitions means 28 partitions stay empty.

5. Not handling OwnershipLost When a partition is reassigned, in-flight work should be cancelled — not completed and checkpointed.

6. Synchronous processing blocking the event loop The EventProcessorClient uses async handlers. Blocking calls (.Result, .Wait()) can deadlock.

7. Forgetting that partitions can't be removed Adding partitions is irreversible. Start conservative and scale up based on measured need.

8. Not accounting for reprocessing after crashes If you checkpoint every 100 events and crash at event 99, all 99 events will be redelivered. Design your processing to be idempotent.


Monitoring with KQL

Consumer Lag per Partition

// Query Azure Monitor / Event Hubs metrics
AzureMetrics
| where ResourceProvider == "MICROSOFT.EVENTHUB"
| where MetricName == "IncomingMessages"
| summarize TotalIncoming = sum(Total) by bin(TimeGenerated, 5m), EntityName
| join kind=leftouter (
    AzureMetrics
    | where MetricName == "OutgoingMessages"
    | summarize TotalOutgoing = sum(Total) by bin(TimeGenerated, 5m), EntityName
) on TimeGenerated, EntityName
| extend Lag = TotalIncoming - TotalOutgoing
| project TimeGenerated, EntityName, TotalIncoming, TotalOutgoing, Lag
| order by TimeGenerated desc

Throughput Monitoring

// Incoming vs outgoing bytes over time
AzureMetrics
| where ResourceProvider == "MICROSOFT.EVENTHUB"
| where MetricName in ("IncomingBytes", "OutgoingBytes")
| summarize AvgBytes = avg(Average) by bin(TimeGenerated, 5m), MetricName
| render timechart

Throttling Detection

// Detect throttled requests (exceeded TU limits)
AzureMetrics
| where ResourceProvider == "MICROSOFT.EVENTHUB"
| where MetricName == "ThrottledRequests"
| where Total > 0
| summarize ThrottleCount = sum(Total) by bin(TimeGenerated, 5m), EntityName
| order by TimeGenerated desc

Partition Distribution Health

// Custom log from your application (if you emit partition ownership events)
CustomLogs_CL
| where Category == "EventHubProcessor"
| where Message contains "partition"
| summarize PartitionsOwned = dcount(PartitionId) by InstanceId, bin(TimeGenerated, 1m)
| render columnchart

Summary

Consumer groups and partitions are the two levers that control how you scale Event Hubs consumption:

Start with a clear mental model: Event Hubs is a partitioned, append-only log. Consumer groups are independent cursors into that log. Everything else — balancing, checkpointing, scaling — follows from these two concepts.