KQL Queries for Troubleshooting

Overview

Kusto Query Language (KQL) is the query language used across Azure Monitor, Log Analytics, Application Insights, and Microsoft Sentinel. Mastering KQL is essential for diagnosing production issues quickly.

Log Analytics Workspace

All Application Insights data flows into a Log Analytics workspace (workspace-based mode). Key tables:

Table	Contains
`requests`	Incoming HTTP requests
`dependencies`	Outbound calls (SQL, HTTP, queues)
`exceptions`	Unhandled and tracked exceptions
`traces`	Custom log messages (ILogger output)
`customEvents`	TrackEvent calls
`customMetrics`	TrackMetric calls
`performanceCounters`	CPU, memory, GC stats

Basic Query Patterns

Find Failed Requests (Last 24h)

requests
| where timestamp > ago(24h)
| where success == false
| summarize count() by resultCode, name
| order by count_ desc

Slow Dependencies

dependencies
| where timestamp > ago(1h)
| where duration > 2000
| project timestamp, target, name, duration, resultCode
| order by duration desc
| take 50

Exception Breakdown

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc
| take 20

Common Query Patterns

Percentile Response Times

requests
| where timestamp > ago(1h)
| summarize
    p50 = percentile(duration, 50),
    p90 = percentile(duration, 90),
    p99 = percentile(duration, 99)
  by bin(timestamp, 5m)
| render timechart

Error Rate Over Time

requests
| where timestamp > ago(24h)
| summarize
    total = count(),
    failed = countif(success == false)
  by bin(timestamp, 15m)
| extend errorRate = (failed * 100.0) / total
| project timestamp, errorRate
| render timechart

Dependency Failure Correlation

dependencies
| where timestamp > ago(1h)
| where success == false
| summarize failCount = count() by target, name, resultCode
| order by failCount desc

Joins — Correlating Data

Link Failed Requests to Their Dependencies

requests
| where timestamp > ago(1h)
| where success == false
| project operation_Id, reqName = name, reqDuration = duration
| join kind=inner (
    dependencies
    | where timestamp > ago(1h)
    | project operation_Id, depTarget = target, depName = name, depDuration = duration, depSuccess = success
) on operation_Id
| where depSuccess == false
| project reqName, depTarget, depName, depDuration
| take 100

Trace Messages for a Specific Operation

let opId = "abc123-operation-id";
union requests, dependencies, traces, exceptions
| where operation_Id == opId
| order by timestamp asc
| project timestamp, itemType, message = coalesce(message, name, type), duration

Advanced Patterns

Anomaly Detection

requests
| where timestamp > ago(7d)
| make-series reqCount = count() on timestamp step 1h
| extend anomalies = series_decompose_anomalies(reqCount)
| mv-expand timestamp to typeof(datetime), reqCount to typeof(long), anomalies to typeof(int)
| where anomalies != 0

Top Slow Endpoints with Impact Score

requests
| where timestamp > ago(24h)
| summarize
    avgDuration = avg(duration),
    count = count(),
    p95 = percentile(duration, 95)
  by name
| extend impactScore = avgDuration * count
| order by impactScore desc
| take 10

Unique Users Affected by Errors

requests
| where timestamp > ago(24h)
| where success == false
| summarize
    affectedUsers = dcount(user_Id),
    errorCount = count()
  by name, resultCode
| order by affectedUsers desc

Query Best Practices

Always filter by time first — where timestamp > ago(...) reduces scan volume
Use project early — Drop unnecessary columns to improve performance
Prefer summarize over raw rows — Aggregations are faster and more useful
Use let for reusable variables — Improves readability

let timeRange = ago(1h);
let threshold = 1000;
requests
| where timestamp > timeRange
| where duration > threshold

Use render for visualization — timechart, barchart, piechart
Limit results with take — Avoid pulling millions of rows to the client
Use has over contains — has is case-insensitive and uses term indexing (faster)

// Fast — uses index
traces | where message has "timeout"

// Slow — substring scan
traces | where message contains "time"

Saving and Sharing Queries

Query Packs — Save queries to a shared pack in Log Analytics
Functions — Save a query as a function for reuse across queries
Workbooks — Embed KQL queries in interactive dashboards
Alerts — Use KQL as the basis for log-based alert rules

Key Takeaways

KQL is the universal query language for Azure Monitor data
Always start with time filters and use summarize for aggregations
Joins via operation_Id let you correlate requests, dependencies, and traces
Use has instead of contains for better performance
Save frequently-used queries as functions or in query packs