Top 7 Azure Integration Architect Interview Questions — And How to Answer Them

By Sujit Kumar Das | Senior Azure Integration Engineer | AZ-204 Certified
Based on real interview experience with senior Azure Integration experts

If you have 6–10 years of Azure Integration Services experience and are targeting Senior / Lead / Architect roles, these are the questions you will almost certainly face. I've compiled the 7 most common deep-dive questions from my interviews — along with the answers that actually impress a 15-year Azure integration expert.

Question 1 — Service Bus vs Event Grid vs Event Hub: When do you use which?

The question

"Walk me through a real scenario where you had to choose between Service Bus and Event Grid — and explain exactly why you picked one over the other."

The answer

	Service Bus	Event Grid	Event Hub
Use when	Transactional messaging, guaranteed delivery	React to events/state changes, fan-out notifications	High-throughput streaming, telemetry, logs
Delivery	At-least-once, ordered, dead-letter	Push-based, fire-and-forget	Pull-based, consumer groups, replay
Example	Order processing, payment workflows	"Blob uploaded → trigger function"	Clickstream, IoT telemetry, audit logs

Real scenario — peak-hour order processing:

For high-volume transactional order processing, always use Service Bus with Premium tier. Orders are transactional — each one must be processed exactly once. Service Bus handles this through the peek-lock mechanism — when a consumer picks up a message it gets locked, processed, then completed. If processing fails, the lock releases and the message retries. After configurable retries, it moves to the Dead Letter Queue (DLQ) — nothing is ever silently lost.

Event Hub is wrong for orders — it has no dead-lettering, no message lock, and offset management makes recovery complex for transactional workloads. Use Event Hub for millions of events per second where losing a few is acceptable — clickstream, IoT sensor data, application logs.

One-liner: "Service Bus for transactional messaging where every message matters. Event Hub for streaming where throughput matters."

Question 2 — APIM Policy Design: Three clients, different requirements

The question

"You have three API consumers — an internal mobile app, a third-party partner, and a public web app. Each has different security and throttling requirements. How do you design the APIM policy structure without duplicating code everywhere?"

The answer

Use three concepts together:

1. Products + Subscriptions Create three separate APIM Products — one per client type. Each gets its own subscription key, rate limits, and access scope.

2. Policy Fragments — solving duplication Define common logic once as a reusable Policy Fragment:

<!-- Define once as "jwt-validation" fragment -->
<fragment>
  <validate-jwt header-name="Authorization" failed-validation-httpcode="401">
    <openid-config url="https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration"/>
    <required-claims>
      <claim name="aud"><value>api://your-app-id</value></claim>
    </required-claims>
  </validate-jwt>
</fragment>

<!-- Reference in each product policy -->
<inbound>
  <include-fragment fragment-id="jwt-validation" />
  <include-fragment fragment-id="common-logging" />
  <rate-limit calls="100" renewal-period="60"/>
</inbound>

3. JWT Claims instead of subscription keys Never use choose-when on raw subscription keys — they rotate and break. Instead extract the appid claim from the already-validated JWT:

<set-variable name="clientType"
  value="@(context.Request.Headers["Authorization"]
    .AsJwt()?.Claims["appid"].FirstOrDefault())" />
<choose>
  <when condition="@(context.Variables.GetValueOrDefault<string>("clientType") == "mobile-app-id")">
    <rate-limit calls="200" renewal-period="60"/>
  </when>
  <when condition="@(context.Variables.GetValueOrDefault<string>("clientType") == "partner-app-id")">
    <rate-limit calls="50" renewal-period="60"/>
  </when>
</choose>

One-liner: "Policy Fragments for reusability, JWT claims for stable client identification, Named Values for environment-specific config."

Question 3 — Logic Apps Standard vs Consumption: How do you decide?

The question

"When a client comes to you with a new integration requirement, what are the first questions you ask before deciding which hosting model to use?"

The answer

Three questions to ask every client:

What is the frequency? — If it runs once a week or less, Consumption is cost-effective. If it runs continuously or at high volume, Standard's predictable pricing wins.
What is your payload complexity and business logic? — Simple transforms, file redirections, 10–15 actions → Consumption. Heavy business logic, multiple destinations, managed connectors, stateful workflows → Standard.
Do your backend systems sit inside a private VNet? — This is the hard differentiator. Consumption runs on shared multi-tenant infrastructure with no VNet support. Standard supports full VNet injection and private endpoints.

The VNet blocker — critical for regulated industries:

In banking, healthcare, and other regulated industries, backend systems sit inside private VNets by compliance requirement. Consumption simply cannot reach them. Standard runs on single-tenant infrastructure, can be injected into the client's VNet, and makes outbound calls through private subnets — keeping all traffic off the public internet.

Other Standard advantages:

Multiple workflows per single app — easier CI/CD management
Stateful and stateless workflow options
Local development via VS Code extension

One-liner: "Consumption for simple, public, low-frequency integrations. Standard when you need VNet, private endpoints, multi-workflow management, or regulated industry compliance."

Question 4 — Zero-Trust Security: Managed Identity end to end

The question

"Walk me through exactly how a Logic App authenticates to a downstream REST API securely — without storing any credentials anywhere."

The answer

The Managed Identity flow:

Logic App triggers
      ↓
Requests token from Azure AD Instance Metadata Service
(internally — no credentials, no secrets)
      ↓
Azure AD verifies the Logic App's identity
and issues an OAuth 2.0 Bearer token
      ↓
Logic App attaches token to the API call
Authorization: Bearer <token>
      ↓
Downstream API validates token with Azure AD
      ↓
API call succeeds — zero credentials stored anywhere

System Assigned vs User Assigned Managed Identity:

	System Assigned	User Assigned
Created	Automatically with the resource	Manually, independently
Lifecycle	Dies when resource is deleted	Lives independently
Shared across resources	No — one resource only	Yes — many resources can share it
Best for	Single resource, simple scenario	Enterprise, multiple resources needing same permissions

Enterprise scenario: At LTIMindtree we had 5+ Logic Apps and multiple Function Apps all needing access to the same Key Vault and Service Bus. We created one User Assigned Managed Identity, granted RBAC roles once, and attached it to all resources. New integrations just attached the same identity — no new permission grants needed.

Completing the zero-trust picture with Key Vault: Managed Identity eliminates credentials. Key Vault eliminates hardcoded config. The Logic App uses its Managed Identity to retrieve secrets from Key Vault at runtime — nothing sensitive is stored anywhere a human or attacker can reach.

One-liner: "Managed Identity eliminates credentials. Key Vault eliminates hardcoded config. Together they give you true zero-trust."

Question 5 — CI/CD for Azure Integration Services

The question

"How do you structure your CI/CD pipeline for Azure Integration Services — and how do you handle environment-specific config across dev, staging, and production without hardcoding anything?"

The answer

Pipeline structure:

Code commit to Git
      ↓
CI Pipeline — validate Bicep templates, lint APIM policies
      ↓
Deploy to Dev (automatic)
      ↓
Deploy to Staging (automatic)
      ↓
Automated smoke tests
      ↓
Manual approval gate
      ↓
Deploy to Production

Three-layer environment config strategy:

Layer 1 — DevOps Variable Groups: Store environment-specific values (API base URLs, resource names, tenant IDs) in one Variable Group per environment. Same pipeline code runs everywhere — only the Variable Group switches.
Layer 2 — APIM Named Values: Store environment-specific config inside APIM policies — backend URLs, audience values. Pipeline updates Named Values during deployment.
Layer 3 — Key Vault for secrets: Actual secrets never touch the pipeline. They live in Key Vault and are accessed at runtime via Managed Identity.

Safe production deployments:

All resources defined in Bicep/ARM templates stored in Git — fully repeatable deployments
Blue-green slot deployments for critical Logic Apps — deploy to new slot, validate, swap. Rollback is a single slot swap with zero downtime
Manual approval gate only opens after automated smoke tests pass on staging

One-liner: "Same code, same pipeline, same templates — only the Variable Group changes per environment. Secrets never touch the pipeline."

Question 6 — Observability: Proactive monitoring and KQL

The question

"Give me a real scenario where your observability setup caught a production integration failure before the business noticed it. And write me a KQL query that finds all failed Logic App runs in the last 24 hours."

The answer

Three proactive monitoring signals — beyond email alerting:

Email on failure is reactive. Layer these proactive signals on top:

Throughput drop alert — Track records processed per run. If a run processes less than 70% of the baseline average, alert immediately — even if the run technically succeeded. Sudden throughput drop signals upstream data issues before full failure.
Duration anomaly alert — Set a baseline for run duration. If a run exceeds 2x the average, alert immediately — this usually means a downstream API is degrading or a Service Bus queue is backing up.
Dead Letter Queue depth monitoring — Monitor DLQ depth in real time. A growing DLQ means messages are failing silently without triggering the Logic App failure path at all — completely missed by catch scope email alerting.

Production-grade KQL query for failed Logic App runs:

AzureDiagnostics
| where ResourceProvider == "MICROSOFT.LOGIC"
| where Category == "WorkflowRuntime"
| where status_s == "Failed"
| where TimeGenerated >= ago(24h)
| project
    TimeGenerated,
    WorkflowName = resource_workflowName_s,
    RunID = resource_runId_s,
    ErrorMessage = error_message_s,
    ErrorCode = error_code_s
| order by TimeGenerated desc

Bonus — DLQ depth monitoring query:

AzureMetrics
| where ResourceProvider == "MICROSOFT.SERVICEBUS"
| where MetricName == "DeadletteredMessages"
| where TimeGenerated >= ago(1h)
| summarize MaxDLQ = max(Total) by Resource, bin(TimeGenerated, 5m)
| where MaxDLQ > 0
| order by TimeGenerated desc

One-liner: "Email on failure catches what broke. Throughput drop, duration anomaly, and DLQ depth alerts catch what's about to break."

Question 7 — HLD Architecture Design: End-to-end enterprise integration

The question

"A retail client has an on-premises ERP, a cloud-based order management system, a third-party logistics REST API, and a customer mobile app. Design the integration architecture for end-to-end order processing."

The answer

High Level Design:

Mobile App
    ↓
Azure APIM  ←── JWT auth, rate limiting, payload normalization
    ↓
Azure Service Bus (Order Created Topic)
    ↓
    ├──→ Logic App 1 → Order Management System (Cloud)
    ├──→ Logic App 2 → On-premises ERP (via On-premises Data Gateway)
    └──→ Logic App 3 → Third-party Logistics REST API
                            ↓
                      Event Grid (Delivery Scheduled Event)
                            ↓
                      Logic App 4 → Mobile Push Notification

Component decisions:

APIM as front door — JWT validation, rate limiting, payload normalization. Single entry point — backend changes are invisible to the mobile app.
Service Bus Topic with 3 subscriptions — decouples mobile app from all backend systems. Customer gets instant acknowledgement even if ERP or logistics is temporarily unavailable. Nothing is ever lost.
Logic Apps as integration workers — one per subscription, each transforming and routing to its target system. Logic App 2 uses On-premises Data Gateway to reach the private ERP without opening inbound firewall ports.
Event Grid for notifications — when logistics confirms delivery, Logic App 3 publishes to Event Grid which fans out to the push notification Logic App. Fire-and-forget is appropriate here — no transactional guarantee needed.

Failure handling at every stage:

Stage	Failure strategy
APIM	Retry with exponential backoff before returning error to mobile app
Service Bus	Max delivery count (3 retries) → auto dead-letter → DLQ depth alert
Logic App	Try-catch scope → log to Application Insights with order ID and run ID → ops alert
Logistics API	Polly-style retry with 3 attempts at 30s intervals → DLQ → manual intervention workflow

Security across the architecture: Every Logic App uses Managed Identity to authenticate to Service Bus, Key Vault, and internal APIs. On-premises ERP connection goes through On-premises Data Gateway over outbound HTTPS — no inbound firewall ports opened. Third-party logistics API credentials stored in Key Vault, retrieved at runtime.

One-liner: "APIM as the front door, Service Bus as the decoupling backbone, Logic Apps as integration workers, Event Grid for notifications, Managed Identity for security, and DLQ for failure safety — nothing is ever lost."

Final thoughts

These questions test whether you think like an architect or like a developer. The difference is:

Developers answer "what does this do"
Architects answer "when would I use this over that, and what happens when it fails"

Always speak in trade-offs. Always bring real numbers. Always have a failure story ready.