Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.

Prototyping AI Voice Orchestration with Azure Communication Services and Azure AI Voice Live

posted on September 29, 2025 | tags: [ AI, Azure, Voice, MCP, Orchestration, Prototype ]
Signals and routing for modern operations – Mark Roxberry
AI voice orchestration architecture
Exploring how Azure Communication Services, Azure AI Voice Live, and an orchestration layer built on FastAPI can reshape operations at scale.

Prototyping Voice Orchestration

I’ve started an early prototype to explore AI-driven orchestration of live voice interactions. The vision is a voice front door that listens, classifies, and routes requests automatically, while handing off smoothly to humans when needed.

Unlike a single monolithic agent, this architecture separates the telephony, orchestration, and AI speech layers.

The Architecture in Layers

  • Phone Network + ACS Front End
    Calls enter through a PSTN number anchored in Azure Communication Services (ACS).

    • Call Automation handles signaling and events.
    • Media Streaming streams audio via WebSockets for real-time processing.
  • Orchestration App (FastAPI)
    A Python FastAPI application running in Azure Container Apps.

    • Hosts the Coordinator Agent (classifies intent with a RAG query to Azure AI Search).
    • Runs the External Operations Agent (integrated with 3rd Party MCP) and the Internal Operations (integrated with Internal Operations MCP).
    • Applies policies and routes each call domain appropriately.
    • Handles human-in-the-loop escalation when confidence is low or policy requires it.
  • Azure OpenAI Voice Live
    Provides ultra-low-latency speech-to-speech AI.

    • Can be configured in Agent Mode (agent defined in Azure AI Studio).
    • Or can act as a speech front end, invoked by the Orchestration App as one of its services (agent-to-agent).
    • Converts caller audio to text, runs reasoning, and streams back voice responses.
  • MCP Servers
    Each domain agent accesses external systems through Model Context Protocol servers.

    • External Ops → 3rd Party Platform Support MCP (work orders, assets).
    • Internal Ops → Internal Operations MCP (ServiceNow, Jira SM).
    • MCPs isolate secrets and standardize tool contracts.
  • Observability Layer
    Telemetry, logging, and FinOps metrics captured with Azure Monitor, App Insights, and Log Analytics.

Challenges

  • Splitting responsibilities: ACS handles telephony, Voice Live handles speech, but the orchestration logic (agents + routing) lives in the FastAPI app. That means designing clean interfaces and avoiding duplicated logic.
  • Agent-to-agent calling: Do we let Voice Live run its own agent, or should the Coordinator Agent call Voice Live as a sub-agent? This is still an open design choice.
  • Latency: Orchestration adds hops; tuning is needed to keep round-trip under a second.
  • Tool integration: MCP servers must be hardened and trustworthy, since they front vendor APIs.
  • Learning loop: Capturing unresolved calls for future training without storing unnecessary PII.

Milestones

Milestone 0 – Foundations

  • Gather requirements with stakeholders.
  • Define system boundaries (telephony, orchestration, AI, MCPs).
  • Design architecture diagrams (call flow, agent orchestration, MCP layers).
  • Establish ecosystem: repos, CI/CD pipeline, dev/test ACS number, Key Vault setup.

Milestone 1 – End-to-End Skeleton

  • Stand up ACS Call Automation + Media Streaming.
  • Connect FastAPI orchestration app.
  • Pass a simple audio call through → Coordinator Agent → dummy MCP server → return canned response to caller.
  • This “wiring test” proves ACS, media handling, orchestration loop, and MCP integration are functioning together.

Milestone 2 – Coordinator and Routing

  • Implement Coordinator Agent with RAG query to Azure AI Search.
  • Route calls to External Ops Agent, Internal Ops Agent, or Human in the Loop based on classification.
  • Log routing decisions and confidence scores.

Milestone 3 – Domain Agents with MCP Integration

  • External Ops Agent integrated with 3rd Party MCP (work orders, assets).
  • Internal Ops Agent integrated with Internal Operations MCP (incidents, service desk tasks).
  • Verify round-trip tool call flows.

Milestone 4 – Human in the Loop & Case Bundling

  • Warm transfer to call center/IT when confidence is low or policy triggers.
  • Package Case Bundle with last transcript, entities, and tool attempts.

Milestone 5 – Observability and FinOps

  • Logging (latency, ASR accuracy, automation vs HITL rate).
  • Dashboards for cost, performance, and error rates.

Milestone 6 – Testing, Documentation, and Handoff

  • System testing across scenarios (happy path, error handling, fallback).
  • User acceptance / operational readiness.
  • Documentation: architecture, agent contracts, runbooks.
  • Handoff to client operations team.

Diagrams

Architecture Overview

Operations Layer

MCP Servers

Orchestration App FastAPI

external ops

internal ops

unknown or low confidence

Phone Network PSTN number

Azure Communication Services Call Automation

ACS Media Streaming realtime audio

Coordinator Agent

External Operations Agent

Internal Operations Agent

WebSocket handler audio IO

Azure AI Voice Live speech to speech

Azure AI Search RAG query

Human in the loop call center or IT

3rd Party MCP

Internal MCP ServiceNow or Jira SM

FastAPI control webhook events

Logging telemetry metrics

Cost analysis

Call Sequence

Human in the loopInternal Ops AgentExternal Ops AgentAzure AI SearchCoordinator AgentFastAPI control endpointAzure AI Voice LiveWebSocket handlerACS Media StreamingACS Call AutomationPSTN callerHuman in the loopInternal Ops AgentExternal Ops AgentAzure AI SearchCoordinator AgentFastAPI control endpointAzure AI Voice LiveWebSocket handlerACS Media StreamingACS Call AutomationPSTN calleralt[external ops][internal ops][human]inbound call1start audio stream2call events started3new session metadata4audio frames5stream audio6interim transcripts and responses7response audio8retrieve domain context9passages and signals10route decision external_ops or internal_ops or human11invoke tools via MCP 3rd Party12work order id and status13create or update incident via MCP14incident id and state15warm transfer with case bundle16summary and next actions17control messages end or transfer18final audio response19

References

Credits

Quote

  • "Signals and routing for modern operations."

Image

Photo by XT7 Core on Unsplash

This post and/or images used in it may have been created or enhanced using generative AI tools for clarity and organization. However, all ideas, technical work, solutions, integrations, and other aspects described here are entirely my own.