Azure Monitor — Complete Observability for Azure Workloads
Observability is not optional—it's essential. Azure Monitor provides a unified platform for collecting, analyzing, and acting on telemetry from your entire Azure ecosystem. This guide covers everything from Application Insights to Log Analytics KQL queries.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ Azure Monitor │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Metrics │ │ Logs │ │ Distributed │ │
│ │ (Time-series) │ │ (Log Analytics│ │ Tracing │ │
│ │ │ │ KQL) │ │ (App Insights)│ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ └──────────────────────┼──────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Dashboards │ │
│ │ Alerts │ │
│ │ Workbooks │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
▲
│
┌───────────────────────────────┼───────────────────────────────────────┐
│ Data Sources │ │
├───────────────────────────────┼───────────────────────────────────────┤
│ │ │
│ Azure Resources Azure Functions VM │
│ ───────────────── ──────────────── ───── │
│ Platform Metrics Custom Metrics Agent │
│ Resource Logs Application Logs │
│ │ │
│ Azure App Service Azure Logic Apps AKS │
│ ─────────────── ───────────────── ─── │
│ Diagnostics Built-in Metrics SDK │
│ Log Stream Execution Logs │
│ │ │
│ On-Premise External APIs DB │
│ ───────── ────────────── ─── │
│ Log Agent Custom Telemetry SQL │
│ WAD Extension Dead Letter Queue GW │
└───────────────────────────┴───────────────────────────────────────────┘
Setting Up Azure Monitor
Creating Log Analytics Workspace
# Create Resource Group
az group create --name monitor-rg --location eastus
# Create Log Analytics Workspace
az monitor log-analytics workspace create \
--resource-group monitor-rg \
--workspace-name azmonitor-la \
--retention-days 30 \
--sku PerGB2018
# Get workspace details
az monitor log-analytics workspace show \
--resource-group monitor-rg \
--workspace-name azmonitor-la \
--query "customerId" -o tsv
Application Insights Setup
# Create Application Insights (Classic - still widely used)
az monitor app-insights component create \
--app az-integration-hub \
--location eastus \
--resource-group monitor-rg \
--retention 30 \
--app-type Web
# Get instrumentation key
az monitor app-insights component show \
--app az-integration-hub \
--resource-group monitor-rg \
--query "instrumentationKey" -o tsv
# Create Application Insights (Workspace-based - recommended)
az monitor app-insights component create \
--app az-integration-hub-wb \
--location eastus \
--resource-group monitor-rg \
--retention 30 \
--workspace "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.OperationalInsights/workspaces/azmonitor-la"
Application Insights (APM)
.NET Application Setup
# Install the NuGet package
dotnet add package Microsoft.ApplicationInsights.AspNetCore
// Program.cs
var builder = WebApplication.CreateBuilder(args);
// Add Application Insights
builder.Services.AddApplicationInsightsTelemetry(options =>
{
options.ConnectionString = builder.Configuration["APPLICATIONINSIGHTS_CONNECTION_STRING"];
options.EnableAdaptiveSampling = true; // Reduce telemetry volume
options.EnableQuickPulseMetricStream = true;
});
// Configure sampling for high-volume applications
builder.Services.Configure<TelemetryConfiguration>(config =>
{
config.DefaultTelemetrySink.TelemetryProcessorChainBuilder
.UseAdaptiveSampling(maxTelemetryItemsPerSecond: 5)
.Build();
});
var app = builder.Build();
app.UseAzureWebAppDiagnostics();
app.Run();
Configuration (appsettings.json)
{
"ApplicationInsights": {
"ConnectionString": "InstrumentationKey=xxx;IngestionEndpoint=https://eastus-0.in.applicationinsights.azure.com/",
"EnableAdaptiveSampling": true,
"EnablePerformanceCounterCollectionModule": true,
"EnableRequestTrackingTelemetryModule": true,
"EnableDependencyTrackingTelemetryModule": true
},
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.ApplicationInsights": "Warning"
}
}
}
Custom Event Tracking
public class OrderService
{
private readonly ILogger<OrderService> _logger;
private readonly TelemetryClient _telemetry;
public OrderService(ILogger<OrderService> logger, TelemetryClient telemetry)
{
_logger = logger;
_telemetry = telemetry;
}
public async Task<OrderResult> CreateOrderAsync(OrderRequest request)
{
// Track custom event
var properties = new Dictionary<string, string>
{
["CustomerId"] = request.CustomerId,
["OrderType"] = request.Type,
["Region"] = request.Region
};
var metrics = new Dictionary<string, double>
{
["OrderValue"] = request.Amount,
["ItemCount"] = request.Items.Count
};
_telemetry.TrackEvent("OrderCreated", properties, metrics);
// Track dependency call
using (var op = _telemetry.StartOperation<DependencyTelemetry>("ProcessPayment"))
{
op.Telemetry.Type = "HTTP";
op.Telemetry.Name = "Stripe Payment API";
op.Telemetry.Target = "api.stripe.com";
op.Telemetry.Data = request.PaymentIntentId;
try
{
var paymentResult = await _paymentService.ProcessPaymentAsync(request);
op.Telemetry.Success = true;
op.Telemetry.ResponseCode = "200";
return paymentResult;
}
catch (Exception ex)
{
op.Telemetry.Success = false;
op.Telemetry.ResultCode = "500";
_telemetry.TrackException(ex);
throw;
}
}
}
}
Request Telemetry and Custom Dimensions
// Custom Telemetry Middleware
public class TelemetryMiddleware
{
private readonly RequestDelegate _next;
private readonly TelemetryClient _telemetry;
public TelemetryMiddleware(RequestDelegate next, TelemetryClient telemetry)
{
_next = next;
_telemetry = telemetry;
}
public async Task InvokeAsync(HttpContext context)
{
// Start operation tracking
using (var operation = _telemetry.StartOperation<RequestTelemetry>(context.Request.Path))
{
operation.Telemetry.Method = context.Request.Method;
operation.Telemetry.Url = context.Request.Url;
operation.Telemetry.ResponseCode = context.Response.StatusCode.ToString();
// Add custom properties
operation.Telemetry.Properties["UserId"] = context.User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
operation.Telemetry.Properties["CorrelationId"] = context.TraceIdentifier;
operation.Telemetry.Properties["ClientIp"] = context.Connection.RemoteIpAddress?.ToString();
// Add performance counter
var sw = Stopwatch.StartNew();
try
{
await _next(context);
}
finally
{
sw.Stop();
operation.Telemetry.Duration = sw.Elapsed;
// Track slow requests
if (sw.ElapsedMilliseconds > 1000)
{
_telemetry.TrackEvent("SlowRequest", new Dictionary<string, string>
{
["Path"] = context.Request.Path,
["Duration"] = sw.ElapsedMilliseconds.ToString()
});
}
}
}
}
}
Node.js Application Insights
import { ApplicationInsights } from "@azure/applicationinsights";
import express from "express";
const appInsights = new ApplicationInsights({
connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING,
samplingSettings: {
maxTelemetryItemsPerSecond: 5,
enableAutoDependencyCorrelation: true,
},
});
appInsights.start();
// Track custom events
const trackOrderCreated = (order: Order) => {
appInsights.defaultClient.trackEvent({
name: "OrderCreated",
properties: {
customerId: order.customerId,
orderId: order.id,
value: order.total,
region: order.region,
},
measurements: {
itemCount: order.items.length,
totalValue: order.total,
},
});
};
// Track dependencies (HTTP calls)
const trackApiCall = async (url: string, method: string) => {
const startTime = Date.now();
try {
const response = await fetch(url, { method });
appInsights.defaultClient.trackDependency({
name: url,
duration: Date.now() - startTime,
success: response.ok,
resultCode: response.status.toString(),
type: "HTTP",
data: url,
});
return response;
} catch (error) {
appInsights.defaultClient.trackDependency({
name: url,
duration: Date.now() - startTime,
success: false,
type: "HTTP",
});
throw error;
}
};
// Track exceptions
process.on("uncaughtException", (error) => {
appInsights.defaultClient.trackException({ exception: error });
});
export default appInsights;
Log Analytics and KQL Queries
Common KQL Queries for Diagnostics
// Failed requests in the last hour
requests
| where timestamp > ago(1h)
| where success == false
| project timestamp, name, resultCode, duration, operation_Id
| order by timestamp desc
// Slowest requests
requests
| where timestamp > ago(24h)
| summarize avg(duration) by name
| order by avg_duration desc
| take 10
// Request failure rate by endpoint
requests
| where timestamp > ago(1h)
| summarize
total = count(),
failures = countif(success == false)
by name
| extend failureRate = failures * 100.0 / total
| where failureRate > 5
// Dependencies (database calls, API calls)
dependencies
| where timestamp > ago(1h)
| summarize
count = count(),
avgDuration = avg(duration),
p95Duration = percentile(duration, 95)
by name, target
| order by avgDuration desc
// Exceptions with stack traces
exceptions
| where timestamp > ago(24h)
| project timestamp, exceptionType, message, details
| order by timestamp desc
// Track custom events (e.g., button clicks, conversions)
customEvents
| where name == "OrderCreated"
| where timestamp > ago(7d)
| summarize orderCount = count() by bin(timestamp, 1d)
| render timechart
// User flows and sessions
requests
| where timestamp > ago(24h)
| summarize
sessionCount = dcount(session_Id),
avgSessionDuration = avg(duration)
by user_Id
| order by sessionCount desc
// Correlate traces with requests
traces
| where timestamp > ago(1h)
| join kind=inner (
requests
| where timestamp > ago(1h)
| project requestId, operation_Id
) on $left.operation_Id == $right.operation_Id
| project timestamp, message, requestId, operation_Id
Advanced Correlation Queries
// End-to-end request flow (trace -> request -> dependency -> exception)
let timeframe = ago(1h);
let operationId = "abc123";
union *
| where timestamp > timeframe
| where operation_Id == operationId
| order by timestamp asc
| project timestamp, source = "unknown", message = strcat(name, " ", message)
| where source != "unknown"
// Find users with failed payments
requests
| where name == "POST /api/payments"
| where resultCode != "200"
| project userId = customDimensions.userId, timestamp, resultCode
| join kind=inner (
customEvents
| where name == "PaymentAttempt"
) on userId
| summarize failedPayments = count() by userId
// Detect anomalies in request patterns
requests
| where timestamp > ago(7d)
| summarize requestCount = count() by bin(timestamp, 1h)
| summarize avgRequests = avg(requestCount), stdevRequests = stdev(requestCount)
| extend upperThreshold = avgRequests + (3 * stdevRequests)
| join kind=inner (
requests
| where timestamp > ago(1h)
| summarize currentCount = count() by bin(timestamp, 1h)
) on $left.avgRequests > $right.currentCount
// Performance degradation detection
requests
| where timestamp > ago(7d)
| where name == "GET /api/orders"
| summarize
p50_last_day = percentile(duration, 50) by bin(timestamp, 1d)
| order by timestamp desc
| take 2
| extend change = (p50_last_day[1] - p50_last_day[0]) / p50_last_day[0] * 100
Metrics and Alerting
Creating Metric Alerts
# Alert on high CPU usage
az monitor metrics alert create \
--name "HighCPUAlert" \
--resource-group monitor-rg \
--condition "type=Metric and aggregation=Average and namespace=Microsoft.Compute/virtualMachines and metric=Percentage CPU and operator=GreaterThan and threshold=80" \
--description "CPU usage above 80%" \
--auto-mitigate true \
--evaluation-frequency 5m \
--window-size 15m
# Alert on failed requests (Application Insights)
az monitor app-insights metric alert create \
--app az-integration-hub \
--name "FailedRequestsAlert" \
--condition "type=Metric and aggregation=Count and metric=requests/failed and operator=GreaterThan 10" \
--evaluation-frequency 5m \
--window-size 15m
# Alert on response time degradation
az monitor metrics alert create \
--name "SlowResponseAlert" \
--resource-group monitor-rg \
--condition "type=Metric and aggregation=Average and metric=Http404 and operator=GreaterThan and threshold=5" \
--description "High 404 rate detected"
Log Analytics Alert Rules
# Create scheduled query alert for error spikes
az monitor scheduled-query create \
--name "ErrorSpikeAlert" \
--resource-group monitor-rg \
--location eastus \
--display-name "Error Spike Alert" \
--description "Alert when errors exceed threshold" \
--evaluation-frequency 5m \
--window-size 15m \
--criteria '{
"operator": "GreaterThan",
"threshold": 10,
"failingPeriods": {
"numberOfEvaluationPeriods": 2,
"minFailingPeriodsToAlert": 2
}
}' \
--query 'exceptions | where timestamp > ago(5m) | count' \
--workspace "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.OperationalInsights/workspaces/azmonitor-la"
Alert Actions
# Create Action Group for email and webhooks
az monitor action-group create \
--name "DevOpsAlerts" \
--resource-group monitor-rg \
--short-name devops \
--email-receiver "oncall@company.com" "On-Call Team" \
--webhook-receiver "WebhookName" "https://hooks.slack.com/services/xxx" \
--webhook-receiver "PagerDuty" "https://events.pagerduty.com/v2/enqueue" \
--sms-receiver "+1234567890" "On-Call Phone"
# Link action group to alert
az monitor metrics alert update \
--name "HighCPUAlert" \
--resource-group monitor-rg \
--action-groups "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.Insights/actionGroups/DevOpsAlerts"
Azure Functions Monitoring
Application Insights for Functions
// Azure Function with proper telemetry
import { app, InvocationContext } from "@azure/functions";
import { TelemetryClient } from "applicationinsights";
const telemetryClient = new TelemetryClient(
process.env.APPLICATIONINSIGHTS_CONNECTION_STRING
);
export async function httpTrigger1(
request: HttpRequest,
context: InvocationContext
): Promise<HttpResponseInit> {
context.log("HTTP trigger processed a request.");
// Track custom event
telemetryClient.trackEvent({
name: "FunctionInvocation",
properties: {
functionName: context.functionName,
invocationId: context.invocationId,
},
});
// Track dependency
const startTime = Date.now();
try {
await callExternalApi();
telemetryClient.trackDependency({
name: "External API",
duration: Date.now() - startTime,
success: true,
});
} catch (error) {
telemetryClient.trackDependency({
name: "External API",
duration: Date.now() - startTime,
success: false,
});
telemetryClient.trackException({ exception: error as Error });
throw error;
}
return { body: `Hello, ${request.query.get("name") || "world"}!` };
}
app.http("httpTrigger1", {
methods: ["get", "post"],
handler: httpTrigger1,
});
Timer Trigger with Monitoring
import { app, Timer } from "@azure/functions";
export async function timerTrigger1(timer: Timer, context: InvocationContext): Promise<void> {
context.log("Timer trigger fired.");
const startTime = Date.now();
let success = true;
let itemsProcessed = 0;
try {
// Simulate processing
itemsProcessed = await processQueueMessages();
telemetryClient.trackEvent({
name: "TimerFunctionCompleted",
properties: {
functionName: context.functionName,
status: "success",
},
measurements: {
duration: Date.now() - startTime,
itemsProcessed: itemsProcessed,
},
});
} catch (error) {
success = false;
telemetryClient.trackException({ exception: error as Error });
telemetryClient.trackEvent({
name: "TimerFunctionFailed",
properties: {
functionName: context.functionName,
error: (error as Error).message,
},
});
} finally {
// Always log execution metrics
context.log(`Function completed in ${Date.now() - startTime}ms, items: ${itemsProcessed}, success: ${success}`);
}
}
app.timer("timerTrigger1", {
schedule: "0 */5 * * * *", // Every 5 minutes
handler: timerTrigger1,
});
Azure Service Bus Monitoring
Monitoring Queue Metrics
# Get queue metrics
az servicebus queue show \
--resource-group monitor-rg \
--namespace-name sb-namespace \
--name orders-queue \
--query "countDetails.activeMessageCount, countDetails.deadLetterMessageCount"
# Create alert for queue depth
az monitor metrics alert create \
--name "QueueDepthAlert" \
--resource-group monitor-rg \
--condition "type=Metric and aggregation=Average and namespace=microsoft.servicebus.namespaces.queues and metric=MessageCount and operator=GreaterThan and threshold=1000" \
--description "Queue has more than 1000 messages"
KQL for Service Bus Logs
// Messages in dead letter queue
AzureDiagnostics
| where ResourceType == "SERVICE_BUS_QUEUES"
| where OperationName == "DeadletteredMessages"
| project TimeGenerated, Resource, MessageId, DeadLetterReason
// Message processing latency
AzureDiagnostics
| where ResourceType == "SERVICE_BUS_QUEUES"
| where OperationName == "Process"
| project TimeGenerated, Resource, DurationMs, MessageId
| order by TimeGenerated desc
// Failed message operations
AzureDiagnostics
| where ResourceType == "SERVICE_BUS_QUEUES"
| where ResultType == "Failure"
| project TimeGenerated, Resource, OperationName, ResultCode, ErrorMessage
// Queue activity over time
AzureDiagnostics
| where ResourceType == "SERVICE_BUS_QUEUES"
| summarize Incoming = countif(OperationName == "IngoingMessages"),
Outgoing = countif(OperationName == "OutgoingMessages")
by bin(TimeGenerated, 1h)
| render timechart
Azure Event Hub Monitoring
Monitoring Throughput
// Incoming messages by partition
AzureDiagnostics
| where ResourceType == "EVENTHUBS"
| where OperationName == "Incoming"
| summarize IncomingMessages = count() by PartitionId, bin(TimeGenerated, 5m)
| render timechart
// Consumer lag detection
AzureDiagnostics
| where ResourceType == "EVENTHUBS"
| where OperationName == "Process"
| project TimeGenerated, ConsumerGroup, PartitionId, SequenceNumber
| summarize max(SequenceNumber) by ConsumerGroup, PartitionId
// Throttling events
AzureDiagnostics
| where ResourceType == "EVENTHUBS"
| where OperationName == "ThrottlingException"
| project TimeGenerated, Resource, ErrorMessage
Workbooks and Dashboards
Creating Custom Workbooks
// workbook.json - Azure Monitor Workbook definition
{
"version": "Notebook/1.0",
"items": [
{
"type": 1,
"content": {
"title": "Application Overview",
"subtitle": "Real-time performance metrics"
}
},
{
"type": 3,
"content": {
"query": "requests | where timestamp > ago(1h) | summarize count(), avg(duration) by name",
"title": "Requests by Endpoint",
"visualization": "table"
}
},
{
"type": 3,
"content": {
"query": "exceptions | where timestamp > ago(24h) | summarize count() by type",
"title": "Exception Distribution",
"visualization": "piechart"
}
},
{
"type": 3,
"content": {
"query": "requests | where timestamp > ago(1h) | summarize avg(duration) by bin(timestamp, 5m)",
"title": "Response Time Trend",
"visualization": "areachart"
}
}
]
}
# Deploy workbook
az monitor workbook create \
--resource-group monitor-rg \
--name "AppDashboard" \
--location eastus \
--source-id "/subscriptions/{sub-id}/resourcegroups/monitor-rg/providers/microsoft.insights/components/az-integration-hub" \
--workspace-id "/subscriptions/{sub-id}/resourcegroups/monitor-rg/providers/microsoft.operationalinsights/workspaces/azmonitor-la"
Diagnostic Settings for Platform Logs
Enabling Diagnostic Logs
# Function App diagnostics
az monitor diagnostic-settings create \
--name "FunctionAppLogs" \
--resource "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.Web/sites/az-function-app" \
--workspace "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.OperationalInsights/workspaces/azmonitor-la" \
--logs '[
{"category": "FunctionAppLogs", "enabled": true},
{"category": "FunctionExecutionLogs", "enabled": true},
{"category": "SystemLogs", "enabled": true}
]' \
--metrics '[
{"category": "AllMetrics", "enabled": true}
]'
# Event Hub diagnostics
az monitor diagnostic-settings create \
--name "EventHubLogs" \
--resource "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.EventHub/namespaces/sb-namespace" \
--workspace "/subscriptions/{sub-id}/resourceGroups/monitor-rg/providers/Microsoft.OperationalInsights/workspaces/azmonitor-la" \
--logs '[
{"category": "ArchiveLogs", "enabled": true},
{"category": "OperationalLogs", "enabled": true},
{"category": "AutoScaleLogs", "enabled": true}
]'
Categories of Diagnostic Logs
| Resource Type | Available Logs |
|---|---|
| Azure Functions | FunctionAppLogs, FunctionExecutionLogs, SystemLogs |
| API Management | GatewayLogs, AuditLogs |
| Service Bus | NamespaceLogs, DiagnosticLogs |
| Event Hub | ArchiveLogs, OperationalLogs |
| Logic Apps | WorkflowRuntimeLogs |
| Key Vault | AuditEvent |
Sampling Strategies
Adaptive Sampling for High-Volume Apps
// Program.cs - Configure adaptive sampling
services.Configure<TelemetryConfiguration>(config =>
{
config.DefaultTelemetrySink.TelemetryProcessorChainBuilder
.UseAdaptiveSampling(
maxTelemetryItemsPerSecond: 10,
excludedTypes: "Exception;Trace", // Always send errors
includedTypes: "Request;Dependency"
)
.Use((next) => new CustomTelemetryProcessor(next))
.Build();
});
// Node.js - Sampling configuration
const appInsights = new ApplicationInsights({
connectionString: process.env.APPINSIGHTS_CONNECTION_STRING,
samplingSettings: {
maxTelemetryItemsPerSecond: 10,
enableAutoDependencyCorrelation: true,
excludedTypes: "Exception",
},
});
Best Practices
- Use Workspace-Based Application Insights: Better data retention and cross-resource queries
- Implement Custom Dimensions: Add business context to telemetry for better filtering
- Set Appropriate Sampling: Balance data volume with observability
- Create Alert Baselines: Avoid alert fatigue by understanding normal patterns
- Use KQL for Troubleshooting: Build reusable queries for common scenarios
- Correlate Distributed Traces: Use Operation_Id to trace requests across services
- Monitor Cost: Set up budget alerts for Log Analytics data ingestion
- Retain Based on Need: Different data types have different retention requirements
- Automate Response: Use Logic Apps for automated remediation
- Document Your Dashboards: Explain what each chart means for onboarding
Summary
Azure Monitor provides comprehensive observability:
- Application Insights: APM with distributed tracing and custom telemetry
- Log Analytics: Store, query, and analyze logs with KQL
- Metrics: Real-time performance data with alerting
- Alerts: Proactive notification based on conditions
- Workbooks: Visual dashboards for operations
Build a monitoring strategy that covers all layers—application, infrastructure, and business metrics.