KQL Queries for Troubleshooting
Overview
Kusto Query Language (KQL) is the query language used across Azure Monitor, Log Analytics, Application Insights, and Microsoft Sentinel. Mastering KQL is essential for diagnosing production issues quickly.
Log Analytics Workspace
All Application Insights data flows into a Log Analytics workspace (workspace-based mode). Key tables:
| Table | Contains |
|---|---|
requests | Incoming HTTP requests |
dependencies | Outbound calls (SQL, HTTP, queues) |
exceptions | Unhandled and tracked exceptions |
traces | Custom log messages (ILogger output) |
customEvents | TrackEvent calls |
customMetrics | TrackMetric calls |
performanceCounters | CPU, memory, GC stats |
Basic Query Patterns
Find Failed Requests (Last 24h)
requests
| where timestamp > ago(24h)
| where success == false
| summarize count() by resultCode, name
| order by count_ desc
Slow Dependencies
dependencies
| where timestamp > ago(1h)
| where duration > 2000
| project timestamp, target, name, duration, resultCode
| order by duration desc
| take 50
Exception Breakdown
exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc
| take 20
Common Query Patterns
Percentile Response Times
requests
| where timestamp > ago(1h)
| summarize
p50 = percentile(duration, 50),
p90 = percentile(duration, 90),
p99 = percentile(duration, 99)
by bin(timestamp, 5m)
| render timechart
Error Rate Over Time
requests
| where timestamp > ago(24h)
| summarize
total = count(),
failed = countif(success == false)
by bin(timestamp, 15m)
| extend errorRate = (failed * 100.0) / total
| project timestamp, errorRate
| render timechart
Dependency Failure Correlation
dependencies
| where timestamp > ago(1h)
| where success == false
| summarize failCount = count() by target, name, resultCode
| order by failCount desc
Joins — Correlating Data
Link Failed Requests to Their Dependencies
requests
| where timestamp > ago(1h)
| where success == false
| project operation_Id, reqName = name, reqDuration = duration
| join kind=inner (
dependencies
| where timestamp > ago(1h)
| project operation_Id, depTarget = target, depName = name, depDuration = duration, depSuccess = success
) on operation_Id
| where depSuccess == false
| project reqName, depTarget, depName, depDuration
| take 100
Trace Messages for a Specific Operation
let opId = "abc123-operation-id";
union requests, dependencies, traces, exceptions
| where operation_Id == opId
| order by timestamp asc
| project timestamp, itemType, message = coalesce(message, name, type), duration
Advanced Patterns
Anomaly Detection
requests
| where timestamp > ago(7d)
| make-series reqCount = count() on timestamp step 1h
| extend anomalies = series_decompose_anomalies(reqCount)
| mv-expand timestamp to typeof(datetime), reqCount to typeof(long), anomalies to typeof(int)
| where anomalies != 0
Top Slow Endpoints with Impact Score
requests
| where timestamp > ago(24h)
| summarize
avgDuration = avg(duration),
count = count(),
p95 = percentile(duration, 95)
by name
| extend impactScore = avgDuration * count
| order by impactScore desc
| take 10
Unique Users Affected by Errors
requests
| where timestamp > ago(24h)
| where success == false
| summarize
affectedUsers = dcount(user_Id),
errorCount = count()
by name, resultCode
| order by affectedUsers desc
Query Best Practices
- Always filter by time first —
where timestamp > ago(...)reduces scan volume - Use
projectearly — Drop unnecessary columns to improve performance - Prefer
summarizeover raw rows — Aggregations are faster and more useful - Use
letfor reusable variables — Improves readability
let timeRange = ago(1h);
let threshold = 1000;
requests
| where timestamp > timeRange
| where duration > threshold
- Use
renderfor visualization —timechart,barchart,piechart - Limit results with
take— Avoid pulling millions of rows to the client - Use
hasovercontains—hasis case-insensitive and uses term indexing (faster)
// Fast — uses index
traces | where message has "timeout"
// Slow — substring scan
traces | where message contains "time"
Saving and Sharing Queries
- Query Packs — Save queries to a shared pack in Log Analytics
- Functions — Save a query as a function for reuse across queries
- Workbooks — Embed KQL queries in interactive dashboards
- Alerts — Use KQL as the basis for log-based alert rules
Key Takeaways
- KQL is the universal query language for Azure Monitor data
- Always start with time filters and use
summarizefor aggregations - Joins via
operation_Idlet you correlate requests, dependencies, and traces - Use
hasinstead ofcontainsfor better performance - Save frequently-used queries as functions or in query packs