KQL Queries for Troubleshooting

Overview

Kusto Query Language (KQL) is the query language used across Azure Monitor, Log Analytics, Application Insights, and Microsoft Sentinel. Mastering KQL is essential for diagnosing production issues quickly.

Log Analytics Workspace

All Application Insights data flows into a Log Analytics workspace (workspace-based mode). Key tables:

TableContains
requestsIncoming HTTP requests
dependenciesOutbound calls (SQL, HTTP, queues)
exceptionsUnhandled and tracked exceptions
tracesCustom log messages (ILogger output)
customEventsTrackEvent calls
customMetricsTrackMetric calls
performanceCountersCPU, memory, GC stats

Basic Query Patterns

Find Failed Requests (Last 24h)

requests
| where timestamp > ago(24h)
| where success == false
| summarize count() by resultCode, name
| order by count_ desc

Slow Dependencies

dependencies
| where timestamp > ago(1h)
| where duration > 2000
| project timestamp, target, name, duration, resultCode
| order by duration desc
| take 50

Exception Breakdown

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc
| take 20

Common Query Patterns

Percentile Response Times

requests
| where timestamp > ago(1h)
| summarize
    p50 = percentile(duration, 50),
    p90 = percentile(duration, 90),
    p99 = percentile(duration, 99)
  by bin(timestamp, 5m)
| render timechart

Error Rate Over Time

requests
| where timestamp > ago(24h)
| summarize
    total = count(),
    failed = countif(success == false)
  by bin(timestamp, 15m)
| extend errorRate = (failed * 100.0) / total
| project timestamp, errorRate
| render timechart

Dependency Failure Correlation

dependencies
| where timestamp > ago(1h)
| where success == false
| summarize failCount = count() by target, name, resultCode
| order by failCount desc

Joins — Correlating Data

Link Failed Requests to Their Dependencies

requests
| where timestamp > ago(1h)
| where success == false
| project operation_Id, reqName = name, reqDuration = duration
| join kind=inner (
    dependencies
    | where timestamp > ago(1h)
    | project operation_Id, depTarget = target, depName = name, depDuration = duration, depSuccess = success
) on operation_Id
| where depSuccess == false
| project reqName, depTarget, depName, depDuration
| take 100

Trace Messages for a Specific Operation

let opId = "abc123-operation-id";
union requests, dependencies, traces, exceptions
| where operation_Id == opId
| order by timestamp asc
| project timestamp, itemType, message = coalesce(message, name, type), duration

Advanced Patterns

Anomaly Detection

requests
| where timestamp > ago(7d)
| make-series reqCount = count() on timestamp step 1h
| extend anomalies = series_decompose_anomalies(reqCount)
| mv-expand timestamp to typeof(datetime), reqCount to typeof(long), anomalies to typeof(int)
| where anomalies != 0

Top Slow Endpoints with Impact Score

requests
| where timestamp > ago(24h)
| summarize
    avgDuration = avg(duration),
    count = count(),
    p95 = percentile(duration, 95)
  by name
| extend impactScore = avgDuration * count
| order by impactScore desc
| take 10

Unique Users Affected by Errors

requests
| where timestamp > ago(24h)
| where success == false
| summarize
    affectedUsers = dcount(user_Id),
    errorCount = count()
  by name, resultCode
| order by affectedUsers desc

Query Best Practices

  1. Always filter by time firstwhere timestamp > ago(...) reduces scan volume
  2. Use project early — Drop unnecessary columns to improve performance
  3. Prefer summarize over raw rows — Aggregations are faster and more useful
  4. Use let for reusable variables — Improves readability
let timeRange = ago(1h);
let threshold = 1000;
requests
| where timestamp > timeRange
| where duration > threshold
  1. Use render for visualizationtimechart, barchart, piechart
  2. Limit results with take — Avoid pulling millions of rows to the client
  3. Use has over containshas is case-insensitive and uses term indexing (faster)
// Fast — uses index
traces | where message has "timeout"

// Slow — substring scan
traces | where message contains "time"

Saving and Sharing Queries

  • Query Packs — Save queries to a shared pack in Log Analytics
  • Functions — Save a query as a function for reuse across queries
  • Workbooks — Embed KQL queries in interactive dashboards
  • Alerts — Use KQL as the basis for log-based alert rules

Key Takeaways

  • KQL is the universal query language for Azure Monitor data
  • Always start with time filters and use summarize for aggregations
  • Joins via operation_Id let you correlate requests, dependencies, and traces
  • Use has instead of contains for better performance
  • Save frequently-used queries as functions or in query packs