Alert Rules & Action Groups

Overview

Azure Monitor Alerts proactively notify you when conditions in your telemetry indicate a problem. Combined with Action Groups, they form a complete incident notification pipeline — from detection to response.

Alert Types

Type	Signal Source	Use Case
Metric Alert	Platform metrics, custom metrics	CPU > 80%, response time > 2s
Log Alert	Log Analytics / KQL query	Error count spike, specific exception
Activity Log Alert	Azure control plane	Resource deleted, deployment failed
Smart Detection	Application Insights ML	Anomaly in failure rate, response time

Metric Alerts

Metric alerts evaluate platform or custom metrics at regular intervals:

Azure CLI — Create a Metric Alert

az monitor metrics alert create \
  --name "HighCPU-AppService" \
  --resource-group myRG \
  --scopes "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Web/sites/myApp" \
  --condition "avg CpuPercentage > 80" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --severity 2 \
  --action "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Insights/actionGroups/OpsTeam"

Bicep / ARM Template

resource cpuAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: 'HighCPU-AppService'
  location: 'global'
  properties: {
    severity: 2
    enabled: true
    scopes: [appService.id]
    evaluationFrequency: 'PT1M'
    windowSize: 'PT5M'
    criteria: {
      'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
      allOf: [
        {
          name: 'CPUCheck'
          metricName: 'CpuPercentage'
          operator: 'GreaterThan'
          threshold: 80
          timeAggregation: 'Average'
        }
      ]
    }
    actions: [{ actionGroupId: actionGroup.id }]
  }
}

Log Alerts (KQL-Based)

Log alerts run a KQL query on a schedule and fire when results meet a threshold:

az monitor scheduled-query create \
  --name "HighErrorRate" \
  --resource-group myRG \
  --scopes "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Insights/components/myAppInsights" \
  --condition "count 'GreaterThan' 50 resource id _ResourceId" \
  --condition-query "requests | where success == false | summarize count() by bin(timestamp, 5m)" \
  --window-size 5m \
  --evaluation-frequency 5m \
  --severity 1 \
  --action-groups "/subscriptions/{sub}/resourceGroups/myRG/providers/Microsoft.Insights/actionGroups/OpsTeam"

Common Log Alert Queries

Exception spike:

exceptions
| where timestamp > ago(5m)
| summarize exceptionCount = count() by type
| where exceptionCount > 10

Dependency failure rate:

dependencies
| where timestamp > ago(15m)
| summarize total = count(), failed = countif(success == false)
| extend failRate = (failed * 100.0) / total
| where failRate > 5

Action Groups

Action Groups define WHO gets notified and HOW:

Notification Types

Type	Description
Email	Send to individual or distribution list
SMS	Text message to phone number
Voice	Automated phone call
Push	Azure mobile app notification
Azure Function	Trigger a function for auto-remediation
Logic App	Start a workflow (create ticket, post to Teams)
Webhook	POST to any HTTP endpoint
ITSM	ServiceNow, Provance integration

Create an Action Group (CLI)

az monitor action-group create \
  --name "OpsTeam" \
  --resource-group myRG \
  --short-name "Ops" \
  --action email ops-lead ops@company.com \
  --action webhook pagerduty "https://events.pagerduty.com/integration/xxx/enqueue"

Bicep

resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
  name: 'OpsTeam'
  location: 'global'
  properties: {
    groupShortName: 'Ops'
    enabled: true
    emailReceivers: [
      { name: 'ops-lead', emailAddress: 'ops@company.com', useCommonAlertSchema: true }
    ]
    webhookReceivers: [
      { name: 'pagerduty', serviceUri: 'https://events.pagerduty.com/integration/xxx/enqueue', useCommonAlertSchema: true }
    ]
  }
}

Alert Processing Rules

Alert processing rules let you suppress or route alerts based on schedule or scope:

Suppression — Silence alerts during maintenance windows
Action group override — Route weekend alerts to on-call team

az monitor alert-processing-rule create \
  --name "MaintenanceWindow" \
  --resource-group myRG \
  --scopes "/subscriptions/{sub}/resourceGroups/myRG" \
  --rule-type RemoveAllActionGroups \
  --schedule-recurrence-type Weekly \
  --schedule-recurrence Sunday \
  --schedule-recurrence-start-time "02:00:00" \
  --schedule-recurrence-end-time "06:00:00" \
  --schedule-time-zone "UTC"

Best Practices

Use severity levels consistently — Sev 0 = critical (page), Sev 2 = warning (email), Sev 4 = informational
Avoid alert fatigue — Only page for actionable alerts; use email/Teams for warnings
Set appropriate window sizes — Too short = noisy; too long = slow detection
Use dynamic thresholds — ML-based thresholds adapt to patterns automatically
Test action groups — Use the "Test" button in the portal to verify notifications work
Use Common Alert Schema — Standardizes payload format across all alert types
Document runbooks — Link each alert to a runbook describing remediation steps

Key Takeaways

Metric alerts are best for infrastructure signals; log alerts for application-level conditions
Action Groups decouple "what to detect" from "who to notify"
Use alert processing rules for maintenance windows and routing
Dynamic thresholds reduce manual threshold tuning
Every alert should have a clear owner and remediation runbook