From beginner to architecture level — partitions, consumer groups, Capture, Kafka protocol, Schema Registry, Function Apps, Stream Analytics, security, monitoring, geo-DR, and every real-time streaming pattern you need.
Azure Event Hubs is a fully managed, real-time data ingestion service capable of receiving and processing millions of events per second. It acts as the "front door" for an event pipeline — a highly scalable publish-subscribe system for streaming telemetry, logs, clickstreams, IoT data, and any high-throughput event stream. It is not a queue; it is a log.
📡
Massive Ingestion
Ingest millions of events per second from any source — devices, apps, services, external systems
🗂️
Partitioned Log
Events are appended to ordered, immutable partition logs — consumers read at their own pace
☕
Kafka Compatible
Drop-in replacement for Apache Kafka — existing Kafka apps work with zero code changes
💾
Capture to Storage
Automatically archive all events to Blob Storage or Data Lake in Avro / Parquet format
🔁
Replay
Events retained up to 90 days — any consumer can replay the full event stream independently
⚖️
Auto-Inflate
Automatically scale throughput units up when needed — no manual intervention required
02Core Concepts & Terminology
Understanding Event Hubs terminology is essential for designing efficient streaming architectures. These concepts map directly to how data flows through partitions, how consumers track progress, and how you scale throughput for real-time workloads.
Term
Definition
Namespace
Container for Event Hubs. Unique DNS: <namespace>.servicebus.windows.net
Event Hub
A named stream within a namespace — analogous to a Kafka topic
Partition
Ordered, immutable sequence of events within an Event Hub. Events are distributed across partitions.
Event
The data unit — a byte array payload plus metadata (properties, offset, sequence number, timestamp)
Producer
Any client that publishes events to an Event Hub
Consumer
Any client that reads events from a partition
Consumer Group
A logical view of the entire Event Hub. Each consumer group tracks its own offsets independently.
Offset
Position of an event within a partition. Consumers checkpoint their offset to resume after restart.
Sequence Number
Monotonically increasing integer per partition — uniquely identifies event position
Throughput Unit (TU)
Standard tier capacity unit: 1 MB/s ingress, 2 MB/s egress per TU
Processing Unit (PU)
Premium tier capacity unit — more powerful than a TU
Checkpoint
Consumer saves its current offset so it can resume from that position after restart
EventProcessorClient
SDK client that handles partition distribution, checkpointing, and load balancing automatically
03Tiers — Basic / Standard / Premium / Dedicated
Choosing the right tier determines your throughput ceiling, retention limits, and network isolation capabilities. Each tier is designed for a different stage of your streaming journey — from development prototyping to enterprise-grade production workloads requiring dedicated infrastructure and compliance guarantees.
Feature
Basic
Standard
Premium
Dedicated
Consumer Groups
1 (default)
Up to 20
Up to 100
Unlimited
Brokered Connections
100
1,000
10,000
Unlimited
Message Retention
1 day
Up to 7 days
Up to 90 days
Up to 90 days
Capture
✗ No
✓ Yes
✓ Yes
✓ Yes
Schema Registry
✗ No
✓ Yes
✓ Yes
✓ Yes
Kafka Protocol
✗ No
✓ Yes
✓ Yes
✓ Yes
VNet / Private Endpoints
✗ No
✗ No
✓ Yes
✓ Yes
Dynamic Partition Scale
✗ No
✗ No
✓ Yes
✓ Yes
Availability Zones
✗ No
✓ Yes
✓ Yes
✓ Yes
Max TUs / PUs
20 TUs
40 TUs
16 PUs
Custom CUs
Pricing model
Per TU/hr
Per TU/hr
Per PU/hr
Per CU/hr
✅
Which Tier to Choose?Use Basic for dev/test only. Use Standard for most production streaming workloads. Use Premiumwhen you need VNet isolation, longer retention (>7d), or dynamic partition scaling. Use Dedicated for compliance, highest throughput, or complete tenant isolation.
04Partitions & Throughput Units
Partitions are the core scalability unit. Each partition is an ordered, append-only log. Events are distributed across partitions — either round-robin (no key) or by partition key(same key always goes to the same partition, enabling ordering per entity).
Concept
Detail
Default partitions
4 (configurable at namespace creation — cannot be changed later on Standard)
Max partitions (Standard)
32
Max partitions (Premium/Dedicated)
2,000 (dynamically scalable on Premium)
Partition key
String hashed to select a partition — guarantees ordering per key
Round-robin (no key)
Events distributed across all partitions — max throughput, no ordering
1 TU capacity
1 MB/s or 1,000 events/s ingress; 2 MB/s egress per TU
Auto-inflate
Standard/Premium: automatically scale TUs up (not down) when throttled
Partition Key Usage
csharp
// Events with same partition key → same partition → ordered
var eventData = new EventData(Encoding.UTF8.GetBytes(JsonSerializer.Serialize(telemetry)))
{
Properties =
{
["DeviceId"] = "sensor-42",
["Region"] = "EU-West"
}
};
// Send with partition key — all events from same device go to same partition
await producerClient.SendAsync(
new[] { eventData },
new SendEventOptions { PartitionKey = "sensor-42" });
⚠️
Partition Count is ImmutableOn Standard tier, partition count cannot be changed after namespace creation. Choose carefully — too few partitions limits parallelism. Premium supports dynamic scaling.
05Sending & Receiving Events
Producing and consuming events is the fundamental interaction with Event Hubs. The SDK provides batching for efficient ingress and the EventProcessorClient for scalable, fault-tolerant consumption with automatic partition balancing and checkpointing across consumer instances.
Send Events — .NET SDK
csharp
var connectionString = "Endpoint=sb://<namespace>.servicebus.windows.net/;...";
var eventHubName = "telemetry";
await using var producerClient = new EventHubProducerClient(connectionString, eventHubName);
// Create a batch (respects size limits automatically)
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
for (int i = 0; i < 100; i++)
{
var payload = JsonSerializer.Serialize(new { deviceId = "d1", temp = 22.5 + i, ts = DateTime.UtcNow });
var eventData = new EventData(Encoding.UTF8.GetBytes(payload));
eventData.Properties["EventType"] = "TemperatureReading";
if (!eventBatch.TryAdd(eventData))
{
// Batch full — send current and start new
await producerClient.SendAsync(eventBatch);
}
}
await producerClient.SendAsync(eventBatch);
Receive Events with EventProcessorClient (Recommended)
csharp
var storageClient = new BlobContainerClient(storageConnStr, "checkpoints");
var processorClient = new EventProcessorClient(
storageClient,
"$Default", // consumer group
connectionString,
eventHubName);
processorClient.ProcessEventAsync += async args =>
{
string body = args.Data.EventBody.ToString();
Console.WriteLine($"Partition {args.Partition.PartitionId}: {body}");
Console.WriteLine($"Offset: {args.Data.Offset}, SeqNo: {args.Data.SequenceNumber}");
// Checkpoint every N events or on a timer
await args.UpdateCheckpointAsync();
};
processorClient.ProcessErrorAsync += args =>
{
Console.WriteLine($"Error on partition {args.PartitionId}: {args.Exception}");
return Task.CompletedTask;
};
await processorClient.StartProcessingAsync();
Console.ReadKey();
await processorClient.StopProcessingAsync();
Send Events — Python SDK
python
from azure.eventhub import EventHubProducerClient, EventData
import json
producer = EventHubProducerClient.from_connection_string(
conn_str="Endpoint=sb://...",
eventhub_name="telemetry"
)
with producer:
event_data_batch = producer.create_batch(partition_key="sensor-42")
for i in range(100):
payload = json.dumps({"deviceId": "sensor-42", "temp": 22.5 + i})
event_data_batch.add(EventData(payload))
producer.send_batch(event_data_batch)
Event Properties Reference
Property
Type
Description
EventBody
byte[]
The raw event payload — your data
Offset
long
Position in partition stream — unique within partition
SequenceNumber
long
Monotonically increasing number per partition
EnqueuedTime
DateTimeOffset
UTC timestamp when event was accepted by Event Hubs
PartitionKey
string
Key used to route event to partition (read-only on receive)
A consumer group is an independent view of the entire Event Hub stream. Each consumer group has its own offset pointers per partition — meaning multiple downstream systems can each read the full event stream independently without interfering with each other.
Scenario
Consumer Groups Needed
Stream Analytics job (analytics)
analytics-cg
Azure Function (real-time processing)
processor-cg
Archival service (cold path)
archive-cg
ML model inference
ml-inference-cg
Debugging / replay
debug-cg
⚠️
One Reader Per Partition Per Consumer GroupAt any point, only ONE active reader should own each partition within a consumer group.EventProcessorClient handles this automatically with distributed leasing. Multiple readers in the same consumer group reading the same partition will cause duplicate processing.
Capture automatically archives all incoming events to Azure Blob Storage or Azure Data Lake Storage Gen2 in Avro format (Parquet also supported). This enables both real-time stream processing AND batch analytics on the same data — the classic Lambda architecture hot/cold path.
Read Captured Avro FilesUse Azure Synapse Analytics, Databricks, or theApache.Avro NuGet package to deserialize captured files. Event Hubs Capture files include a schema header and event body records.
08Schema Registry
The Event Hubs Schema Registry is a centralized repository for managing event schemas (Avro, JSON Schema, Protobuf). Producers register schemas; consumers validate incoming events against registered schemas — ensuring schema governance across producers and consumers without out-of-band coordination.
Feature
Description
Schema Groups
Logical container for related schemas with a compatibility mode
Compatibility Modes
None, Backward, Forward, Full — controls what schema changes are allowed
Schema Versioning
Each schema registration gets a unique version — old consumers still work
Supported Formats
Avro, JSON Schema, Protobuf (preview)
Cached on client
Schemas are cached locally — no registry lookup on every event
Send with Avro Schema Validation
csharp
var schemaRegistryClient = new SchemaRegistryClient(
"<namespace>.servicebus.windows.net",
new DefaultAzureCredential());
var serializer = new SchemaRegistryAvroSerializer(
schemaRegistryClient,
schemaGroupName: "my-schema-group",
new SchemaRegistryAvroSerializerOptions { AutoRegisterSchemas = true });
var order = new Order { OrderId = "123", Amount = 99.99 };
var eventData = await serializer.SerializeAsync<EventData, Order>(order);
await producerClient.SendAsync(new[] { eventData });
09Kafka Protocol Support
Event Hubs Standard and above expose a Kafka-compatible endpoint on port 9093. Existing Kafka producers and consumers can point at Event Hubs with minimal config changes — no code changes required for most workloads.
Kafka Producer Config (Java)
properties
bootstrap.servers=<namespace>.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="$ConnectionString" \
password="Endpoint=sb://<namespace>.servicebus.windows.net/;SharedAccessKeyName=...";
# Event Hub name = Kafka topic name
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
Kafka Concept
Event Hubs Equivalent
Broker
Event Hubs namespace endpoint
Topic
Event Hub
Partition
Partition
Consumer Group
Consumer Group
Offset
Offset
Topic Retention
Message retention (1–90 days)
Replication Factor
Managed internally — not user-configurable
__consumer_offsets
Managed internally by Event Hubs
☕
Unsupported Kafka FeaturesEvent Hubs does not support Kafka Streams, Kafka Connect (use Azure alternatives), compacted topics, or transactions. Check the compatibility matrix in Microsoft docs before migrating complex Kafka workloads.
10Security & Authentication
Securing your Event Hubs namespace is critical since it often carries sensitive telemetry and business data at high volume. Prefer Managed Identity with Entra ID RBAC over connection strings — this eliminates secret rotation overhead and reduces the blast radius of credential leaks in real-time streaming pipelines.
Method
Type
Recommended?
Connection String / SAS
Shared Access Signature
⚠️ Dev/test only
SAS Policy (scoped)
Shared key scoped to namespace/hub
✓ Acceptable
Managed Identity
Entra ID token
✅ Recommended
Service Principal
Entra ID token
✅ Recommended
Managed Identity — Zero Secrets
csharp
// Uses DefaultAzureCredential — works with Managed Identity, VS, CLI
var credential = new DefaultAzureCredential();
var producerClient = new EventHubProducerClient(
"<namespace>.servicebus.windows.net",
"telemetry",
credential);
var processorClient = new EventProcessorClient(
checkpointStore,
"$Default",
"<namespace>.servicebus.windows.net",
"telemetry",
credential);
RBAC Roles
Role
Permissions
Azure Event Hubs Data Owner
Full access — send, receive, manage
Azure Event Hubs Data Sender
Send (produce) events only
Azure Event Hubs Data Receiver
Receive (consume) events only
Schema Registry Contributor
Read and write schemas
Schema Registry Reader
Read schemas only
bash
# Assign Sender role to a Managed Identity
az role assignment create \
--assignee <principal-id> \
--role "Azure Event Hubs Data Sender" \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/\
Microsoft.EventHub/namespaces/<namespace>/eventhubs/<eventhub>
11Network Security & VNet
Network isolation ensures your streaming data never traverses the public internet. For production workloads handling sensitive real-time data, use Private Endpoints to route traffic exclusively through your VNet — this is especially important for compliance-driven architectures where data exfiltration risks must be minimized.
Feature
Basic
Standard
Premium
Dedicated
IP Filtering
✓ Yes
✓ Yes
✓ Yes
✓ Yes
VNet Service Endpoints
✗ No
✓ Yes
✓ Yes
✓ Yes
Private Endpoints
✗ No
✗ No
✓ Yes
✓ Yes
Disable Public Network Access
✗ No
✗ No
✓ Yes
✓ Yes
Private Endpoint via CLI
bash
# Create private endpoint for Event Hubs namespace
az network private-endpoint create \
--name myEventHubPrivateEndpoint \
--resource-group myRG \
--vnet-name myVNet \
--subnet mySubnet \
--private-connection-resource-id /subscriptions/<sub>/resourceGroups/<rg>/providers/\
Microsoft.EventHub/namespaces/<namespace> \
--group-ids namespace \
--connection-name myPrivateEndpointConnection
# Create Private DNS Zone for resolution
az network private-dns zone create \
--resource-group myRG \
--name privatelink.servicebus.windows.net
12Logic Apps Integration
Logic Apps provides a low-code way to react to Event Hub events and orchestrate downstream workflows. Use it for moderate-volume scenarios where you need to fan out events to multiple SaaS connectors, transform payloads, or trigger approval workflows — all without writing custom consumer code.
Action / Trigger
Direction
Description
When events are available in Event Hub
Trigger
Poll Event Hub on an interval; fire when events exist
Logic Apps vs Functions for Event HubsFor high-throughput event processing, prefer Azure Functions with EventHubTrigger — it scales automatically per partition. Use Logic Apps for lower-volume orchestration workflows that need to react to events and call many downstream services.
13Function Apps Integration
Azure Functions is the most common compute layer for Event Hubs processing. The EventHubTrigger automatically scales function instances per partition, handles checkpointing, and supports batch processing — making it ideal for high-throughput real-time event transformation, enrichment, and routing.
Events pre-fetched from partition into local buffer
300
batchCheckpointFrequency
Checkpoint every N batches
1
initialOffsetOptions.type
fromStart, fromEnd, fromEnqueuedTime
fromStart
14Stream Analytics Integration
Azure Stream Analytics uses Event Hubs as its primary streaming input. Define SQL-like queries to filter, aggregate, join, and transform streams in real time — with outputs to SQL, Cosmos DB, Blob Storage, Power BI, Service Bus, and more.
sql
-- Real-time aggregation: avg temperature per device per 5-minute tumbling window
SELECT
DeviceId,
AVG(Temperature) AS AvgTemp,
MAX(Temperature) AS MaxTemp,
MIN(Temperature) AS MinTemp,
COUNT(*) AS EventCount,
System.Timestamp() AS WindowEnd
INTO [sql-output]
FROM [eventhub-input] TIMESTAMP BY EnqueuedTime
GROUP BY
DeviceId,
TumblingWindow(minute, 5)
-- Alert when temperature exceeds threshold
SELECT
DeviceId,
Temperature,
System.Timestamp() AS AlertTime
INTO [servicebus-alerts]
FROM [eventhub-input] TIMESTAMP BY EnqueuedTime
WHERE Temperature > 85.0
Window Type
Description
Use Case
Tumbling
Fixed-size, non-overlapping windows
Regular aggregation (every 5 min)
Hopping
Fixed-size, overlapping windows
Moving average (5-min window, 1-min hop)
Sliding
Fires whenever an event enters or leaves the window
Continuous metric tracking
Session
Groups events by activity gaps
User session analytics
Snapshot
Groups events with the same timestamp
Batch processing
15Event Hubs vs Service Bus
Event Hubs and Service Bus solve different messaging problems. Event Hubs is a high-throughput streaming log optimized for telemetry ingestion and replay, while Service Bus is a transactional message broker for reliable command/task processing. Choose based on whether you need stream replay and massive scale, or guaranteed delivery with dead-lettering and sessions.
Dimension
Event Hubs
Service Bus
Primary purpose
High-throughput event ingestion & streaming
Reliable enterprise message brokering
Messaging model
Partitioned log — consumers pull at own pace
Queue / Topic — competitive or pub-sub
Message deletion
After retention period (not on consume)
On successful completion
Ordering
Per partition (with partition key)
FIFO with sessions
Max throughput
Millions events/sec
~1 million messages/sec (Premium)
Max message size
1 MB (Standard), 1 MB (Premium)
256 KB (Standard), 100 MB (Premium)
Dead-Letter Queue
✗ No
✓ Yes
Message TTL / expiry
✓ Yes
✓ Yes
Transactions
✗ No
✓ Yes
Replay events
✓ Yes
✗ No
Consumer groups
✓ Yes
Via topic subscriptions
Kafka compatible
✓ Yes
✗ No
Use when
Telemetry, logs, clickstreams, IoT data
Order processing, workflows, task queues
16Event Hubs vs Event Grid
Event Hubs and Event Grid are complementary services often used together. Event Hubs excels at ingesting continuous high-volume data streams with replay capability, while Event Grid is a reactive eventing fabric that pushes discrete state-change notifications to subscribers. Use Event Hubs when you need ordered, replayable streams and Event Grid when you need instant push-based reactions to Azure resource events.
Dimension
Event Hubs
Event Grid
Purpose
Ingest & stream high-volume telemetry data
React to discrete events across Azure services
Throughput
Millions events/sec (streaming)
~10M events/sec (eventing)
Event size
Up to 1 MB
Up to 1 MB
Retention
1–90 days (time-based)
24 hours (retry), then dead-letter
Pull vs Push
Pull (consumers read at own pace)
Push (Event Grid delivers to handlers)
Ordering
Per-partition ordering
No ordering guarantee
Replay
✓ Yes
✗ No
Fan-out to multiple consumers
Via consumer groups
Via subscriptions to same topic
Use when
IoT telemetry, logs, stream processing pipelines
Reacting to blob created, resource changes, custom events
💡
Using All Three TogetherA common pattern: IoT devices → Event Hubs (ingest millions/sec) → Stream Analytics (real-time processing) → alerts to Service Bus (reliable delivery) + Event Grid (trigger downstream workflows reactively).
17Monitoring & Diagnostics
Proactive monitoring is essential for real-time streaming systems where data loss or processing delays have immediate business impact. Track throttling, consumer lag, and error rates to detect issues before they cascade — and use KQL queries in Log Analytics to build dashboards and automated alerts for your Event Hubs namespace.
Metric
Description
Alert Condition
Incoming Messages
Events published per interval
Significant drop = producer issue
Outgoing Messages
Events consumed per interval
Drop = consumer lag issue
Incoming Bytes
Data volume ingress
Approaching TU limit
Outgoing Bytes
Data volume egress
Approaching TU limit
Throttled Requests
Requests exceeding TU quota
> 0 — add TUs or enable auto-inflate
Captured Messages
Events written by Capture
Monitor against Incoming
Consumer Lag (preview)
How far behind each consumer group is
Growing lag = slow consumer
Server Errors
Internal Event Hubs errors
> 0 — alert immediately
KQL — Throttled Requests in Last Hour
kql
AzureMetrics
| where ResourceType == "MICROSOFT.EVENTHUB/NAMESPACES"
| where MetricName == "ThrottledRequests"
| where TimeGenerated > ago(1h)
| summarize TotalThrottled = sum(Total) by bin(TimeGenerated, 5m), Resource
| where TotalThrottled > 0
| order by TimeGenerated desc
KQL — Consumer Group Lag
kql
AzureDiagnostics
| where ResourceType == "EVENTHUBS"
| where Category == "ArchiveLogs"
| project TimeGenerated, EventHubName_s, PartitionId_s,
OffsetSequenceNumber_d, LastSequenceNumber_d
| extend Lag = LastSequenceNumber_d - OffsetSequenceNumber_d
| where Lag > 10000
| order by Lag desc
18Geo-DR & High Availability
Feature
Description
Tier
Geo-Disaster Recovery (Metadata)
Replicate namespace config to secondary region; manual failover via alias. Messages NOT replicated.
Standard+
Availability Zones
3-zone redundancy within a region — automatic, no config needed
Standard+
Geo-Replication (Preview)
Full event replication across regions in near real-time
Premium
Zone Redundant Namespaces
Namespace resilient to single AZ failure
Standard+
⚠️
Geo-DR Does Not Replicate MessagesStandard Geo-DR replicates only namespace metadata (Event Hub configs, consumer groups, policies). In a failover scenario, events not yet processed from the primary region are lost. Use Geo-Replication (Premium) or design producers to dual-write for full message HA.
Configure Geo-DR Alias via CLI
bash
# Pair primary and secondary namespaces
az eventhubs georecovery-alias create \
--resource-group myRG \
--namespace-name myPrimaryNamespace \
--alias myGeoAlias \
--partner-namespace /subscriptions/<sub>/resourceGroups/<rg>/providers/\
Microsoft.EventHub/namespaces/mySecondaryNamespace
# Producers/consumers always use the alias endpoint:
# myGeoAlias.servicebus.windows.net
# Initiate failover (irreversible — use with care)
az eventhubs georecovery-alias fail-over \
--resource-group myRG \
--namespace-name mySecondaryNamespace \
--alias myGeoAlias
19Architecture Patterns
Event Hubs sits at the center of most Azure real-time architectures. These proven patterns show how to combine Event Hubs with downstream compute and storage services to build end-to-end streaming solutions — from IoT telemetry pipelines to clickstream analytics and Kafka migration strategies.
🌡️
IoT Telemetry Pipeline
Millions of devices → Event Hubs → Stream Analytics (anomaly detection) + Capture (cold storage) + Function Apps (alerts)
🔥
Hot / Cold Path (Lambda)
Event Hubs fans out to Stream Analytics (real-time hot path) and Capture → Databricks (batch cold path)
Raw events → Event Hubs → Databricks streaming → Feature Store → model serving in real time
🌉
Kafka Migration
Existing Kafka producers point to Event Hubs Kafka endpoint — zero-code migration with managed infrastructure
20Pricing Overview
Event Hubs pricing is based on throughput units (or processing units for Premium), ingress event count, and optional features like Capture and extended retention. Understanding the cost model helps you right-size your streaming infrastructure — over-provisioning TUs wastes budget while under-provisioning causes throttling and data loss.
Standard Tier
Resource
Price (approx)
Throughput Unit (per hour)
~$0.015/TU/hr (~$11/TU/month)
Ingress events (per million)
~$0.028
Capture (per hour per TU)
~$0.028
Extended retention (per GB/month)
~$0.012
Premium Tier
Resource
Price (approx)
Processing Unit 1 (PU1)
~$730/month
Processing Unit 2 (PU2)
~$1,460/month
Capture
Included
Schema Registry
Included
💰
Cost OptimizationEnable Auto-inflate on Standard to avoid throttling without over-provisioning TUs. Use Capture skip empty archives to avoid paying for empty Avro files. Monitor consumer lag — a slow consumer doesn't reduce your bill but may force longer (and costlier) retention.
21Quick Reference Cheat Sheet
Keep this cheat sheet handy for connection string formats, SDK patterns, and service limits. These are the most frequently referenced values when building and debugging Event Hubs streaming applications in production.