Alert Rules & Action Groups
Overview
Azure Monitor Alerts proactively notify you when conditions in your telemetry indicate a problem. Combined with Action Groups, they form a complete incident notification pipeline — from detection to response.
Alert Types
| Type | Signal Source | Use Case |
|---|---|---|
| Metric Alert | Platform metrics, custom metrics | CPU > 80%, response time > 2s |
| Log Alert | Log Analytics / KQL query | Error count spike, specific exception |
| Activity Log Alert | Azure control plane | Resource deleted, deployment failed |
| Smart Detection | Application Insights ML | Anomaly in failure rate, response time |
Metric Alerts
Metric alerts evaluate platform or custom metrics at regular intervals:
Azure CLI — Create a Metric Alert
az monitor metrics alert create \
--name "HighCPU-AppService" \
--resource-group myRG \
--scopes "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Web/sites/myApp" \
--condition "avg CpuPercentage > 80" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Insights/actionGroups/OpsTeam"
Bicep / ARM Template
resource cpuAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
name: 'HighCPU-AppService'
location: 'global'
properties: {
severity: 2
enabled: true
scopes: [appService.id]
evaluationFrequency: 'PT1M'
windowSize: 'PT5M'
criteria: {
'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
allOf: [
{
name: 'CPUCheck'
metricName: 'CpuPercentage'
operator: 'GreaterThan'
threshold: 80
timeAggregation: 'Average'
}
]
}
actions: [{ actionGroupId: actionGroup.id }]
}
}
Log Alerts (KQL-Based)
Log alerts run a KQL query on a schedule and fire when results meet a threshold:
az monitor scheduled-query create \
--name "HighErrorRate" \
--resource-group myRG \
--scopes "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Insights/components/myAppInsights" \
--condition "count 'GreaterThan' 50 resource id _ResourceId" \
--condition-query "requests | where success == false | summarize count() by bin(timestamp, 5m)" \
--window-size 5m \
--evaluation-frequency 5m \
--severity 1 \
--action-groups "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Insights/actionGroups/OpsTeam"
Common Log Alert Queries
Exception spike:
exceptions
| where timestamp > ago(5m)
| summarize exceptionCount = count() by type
| where exceptionCount > 10
Dependency failure rate:
dependencies
| where timestamp > ago(15m)
| summarize total = count(), failed = countif(success == false)
| extend failRate = (failed * 100.0) / total
| where failRate > 5
Action Groups
Action Groups define WHO gets notified and HOW:
Notification Types
| Type | Description |
|---|---|
| Send to individual or distribution list | |
| SMS | Text message to phone number |
| Voice | Automated phone call |
| Push | Azure mobile app notification |
| Azure Function | Trigger a function for auto-remediation |
| Logic App | Start a workflow (create ticket, post to Teams) |
| Webhook | POST to any HTTP endpoint |
| ITSM | ServiceNow, Provance integration |
Create an Action Group (CLI)
az monitor action-group create \
--name "OpsTeam" \
--resource-group myRG \
--short-name "Ops" \
--action email ops-lead ops@company.com \
--action webhook pagerduty "https://events.pagerduty.com/integration/xxx/enqueue"
Bicep
resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
name: 'OpsTeam'
location: 'global'
properties: {
groupShortName: 'Ops'
enabled: true
emailReceivers: [
{ name: 'ops-lead', emailAddress: 'ops@company.com', useCommonAlertSchema: true }
]
webhookReceivers: [
{ name: 'pagerduty', serviceUri: 'https://events.pagerduty.com/integration/xxx/enqueue', useCommonAlertSchema: true }
]
}
}
Alert Processing Rules
Alert processing rules let you suppress or route alerts based on schedule or scope:
- Suppression — Silence alerts during maintenance windows
- Action group override — Route weekend alerts to on-call team
az monitor alert-processing-rule create \
--name "MaintenanceWindow" \
--resource-group myRG \
--scopes "/subscriptions/{sub}/resourceGroups/myRG" \
--rule-type RemoveAllActionGroups \
--schedule-recurrence-type Weekly \
--schedule-recurrence Sunday \
--schedule-recurrence-start-time "02:00:00" \
--schedule-recurrence-end-time "06:00:00" \
--schedule-time-zone "UTC"
Best Practices
- Use severity levels consistently — Sev 0 = critical (page), Sev 2 = warning (email), Sev 4 = informational
- Avoid alert fatigue — Only page for actionable alerts; use email/Teams for warnings
- Set appropriate window sizes — Too short = noisy; too long = slow detection
- Use dynamic thresholds — ML-based thresholds adapt to patterns automatically
- Test action groups — Use the "Test" button in the portal to verify notifications work
- Use Common Alert Schema — Standardizes payload format across all alert types
- Document runbooks — Link each alert to a runbook describing remediation steps
Key Takeaways
- Metric alerts are best for infrastructure signals; log alerts for application-level conditions
- Action Groups decouple "what to detect" from "who to notify"
- Use alert processing rules for maintenance windows and routing
- Dynamic thresholds reduce manual threshold tuning
- Every alert should have a clear owner and remediation runbook