This is a demonstration of more advanced patterns for voice agents, using the OpenAI Realtime API and the OpenAI Agents SDK.
This project uses the OpenAI Agents SDK, a toolkit for building, managing, and deploying advanced AI agents. The SDK provides:
For full documentation, guides, and API references, see the official OpenAI Agents SDK Documentation.
NOTE: For a version that does not use the OpenAI Agents SDK, see the branch without-agents-sdk.
There are two main patterns demonstrated:
1. Chat-Supervisor: A realtime-based chat agent interacts with the user and handles basic tasks, while a more intelligent, text-based supervisor model (e.g., gpt-4.1) is used extensively for tool calls and more complex responses. This approach provides an easy onramp and high-quality answers, with a small increase in latency.
2. Sequential Handoff: Specialized agents (powered by realtime api) transfer the user between them to handle specific user intents. This is great for customer service, where user intents can be handled sequentially by specialist models that excel in a specific domains. This helps avoid the model having all instructions and tools in a single agent, which can degrade performance.
npm i.OPENAI_API_KEY to your env. Either add it to your .bash_profile or equivalent, or copy .env.sample to .env and add it there.npm run devchatSupervisor Agent Config.This is demonstrated in the chatSupervisor Agent Config. The chat agent uses the realtime model to converse with the user and handle basic tasks, like greeting the user, casual conversation, and collecting information, and a more intelligent, text-based supervisor model (e.g. gpt-4.1) is used extensively to handle tool calls and more challenging responses. You can control the decision boundary by "opting in" specific tasks to the chat agent as desired.
Video walkthrough: https://x.com/noahmacca/status/1927014156152058075
In this exchange, note the immediate response to collect the phone number, and the deferral to the supervisor agent to handle the tool call and formulate the response. There ~2s between the end of "give me a moment to check on that." being spoken aloud and the start of the "Thanks for waiting. Your last bill...".
sequenceDiagram
participant User
participant ChatAgent as Chat Agent
(gpt-4o-realtime-mini)
participant Supervisor as Supervisor Agent
(gpt-4.1)
participant Tool as Tool
alt Basic chat or info collection
User->>ChatAgent: User message
ChatAgent->>User: Responds directly
else Requires higher intelligence and/or tool call
User->>ChatAgent: User message
ChatAgent->>User: "Let me think"
ChatAgent->>Supervisor: Forwards message/context
alt Tool call needed
Supervisor->>Tool: Calls tool
Tool->>Supervisor: Returns result
end
Supervisor->>ChatAgent: Returns response
ChatAgent->>User: Delivers response
end
gpt-4.1 in your voice agents.==== Domain-Specific Agent Instructions ====.chatAgentInstructions. We recommend a brief yaml description rather than json to ensure the model doesn't get confused and try calling the tool directly.# Allow List of Permitted Actions section.gpt-4o-mini-realtime for the chatAgent and/or gpt-4.1-mini for the supervisor model. To maximize intelligence on particularly difficult or high-stakes tasks, consider trading off latency and adding chain-of-thought to your supervisor prompt, or using an additional reasoning model-based supervisor that uses o4-mini.This pattern is inspired by OpenAI Swarm and involves the sequential handoff of a user between specialized agents. Handoffs are decided by the model and coordinated via tool calls, and possible handoffs are defined explicitly in an agent graph. A handoff triggers a session.update event with new instructions and tools. This pattern is effective for handling a variety of user intents with specialist agents, each of which might have long instructions and numerous tools.
Here's a video walkthrough showing how it works. You should be able to use this repo to prototype your own multi-agent realtime voice app in less than 20 minutes!
In this simple example, the user is transferred from a greeter agent to a haiku agent. See below for the simple, full configuration of this flow.
Configuration in src/app/agentConfigs/simpleExample.ts
import { RealtimeAgent } from '@openai/agents/realtime';
// Define agents using the OpenAI Agents SDK
export const haikuWriterAgent = new RealtimeAgent({
name: 'haikuWriter',
handoffDescription: 'Agent that writes haikus.', // Context for the agent_transfer tool
instructions:
'Ask the user for a topic, then reply with a haiku about that topic.',
tools: [],
handoffs: [],
});
export const greeterAgent = new RealtimeAgent({
name: 'greeter',
handoffDescription: 'Agent that greets the user.',
instructions:
"Please greet the user and ask them if they'd like a haiku. If yes, hand off to the 'haikuWriter' agent.",
tools: [],
handoffs: [haikuWriterAgent], // Define which agents this agent can hand off to
});
// An Agent Set is just an array of the agents that participate in the scenario
export default [greeterAgent, haikuWriterAgent];
This is a more complex, representative implementation that illustrates a customer service flow, with the following features:
- A more complex agent graph with agents for user authentication, returns, sales, and a placeholder human agent for escalations.
- An escalation by the returns agent to o4-mini to validate and initiate a return, as an example high-stakes decision, using a similar pattern to the above.
- Prompting models to follow a state machine, for example to accurately collect things like names and phone numbers with confirmation character by character to authenticate a user.
- To test this flow, say that you'd like to return your snowboard and go through the necessary prompts!
Configuration in src/app/agentConfigs/customerServiceRetail/index.ts.
import authentication from "./authentication";
import returns from "./returns";
import sales from "./sales";
import simulatedHuman from "./simulatedHuman";
import { injectTransferTools } from "../utils";
authentication.downstreamAgents = [returns, sales, simulatedHuman];
returns.downstreamAgents = [authentication, sales, simulatedHuman];
sales.downstreamAgents = [authentication, returns, simulatedHuman];
simulatedHuman.downstreamAgents = [authentication, returns, sales];
const agents = injectTransferTools([
authentication,
returns,
sales,
simulatedHuman,
]);
export default agents;
This diagram illustrates a more advanced interaction flow defined in src/app/agentConfigs/customerServiceRetail/, including detailed events.
Show CustomerServiceRetail Flow Diagram
sequenceDiagram
participant User
participant WebClient as Next.js Client
participant NextAPI as /api/session
participant RealtimeAPI as OpenAI Realtime API
participant AgentManager as Agents (authentication, returns, sales, simulatedHuman)
participant o1mini as "o4-mini" (Escalation Model)
Note over WebClient: User navigates to ?agentConfig=customerServiceRetail
User->>WebClient: Open Page
WebClient->>NextAPI: GET /api/session
NextAPI->>RealtimeAPI: POST /v1/realtime/sessions
RealtimeAPI->>NextAPI: Returns ephemeral session
NextAPI->>WebClient: Returns ephemeral token (JSON)
Note right of WebClient: Start RTC handshake
WebClient->>RealtimeAPI: Offer SDP (WebRTC)
RealtimeAPI->>WebClient: SDP answer
WebClient->>WebClient: DataChannel "oai-events" established
Note over AgentManager: Default agent is "authentication"
User->>WebClient: "Hi, I'd like to return my snowboard."
WebClient->>AgentManager: conversation.item.create (role=user)
WebClient->>RealtimeAPI: {type: "conversation.item.create"}
WebClient->>RealtimeAPI: {type: "response.create"}
authentication->>AgentManager: Requests user info, calls authenticate_user_information()
AgentManager-->>WebClient: function_call => name="authenticate_user_information"
WebClient->>WebClient: handleFunctionCall => verifies details
Note over AgentManager: After user is authenticated
authentication->>AgentManager: transferAgents("returns")
AgentManager-->>WebClient: function_call => name="transferAgents" args={ destination: "returns" }
WebClient->>WebClient: setSelectedAgentName("returns")
Note over returns: The user wants to process a return
returns->>AgentManager: function_call => checkEligibilityAndPossiblyInitiateReturn
AgentManager-->>WebClient: function_call => name="checkEligibilityAndPossiblyInitiateReturn"
Note over WebClient: The WebClient calls /api/chat/completions with model="o4-mini"
WebClient->>o1mini: "Is this item eligible for return?"
o1mini->>WebClient: "Yes/No (plus notes)"
Note right of returns: Returns uses the result from "o4-mini"
returns->>AgentManager: "Return is approved" or "Return is denied"
AgentManager->>WebClient: conversation.item.create (assistant role)
WebClient->>User: Displays final verdict
src/app/agentConfigs/index.ts and you should be able to select it in the UI in the "Scenario" dropdown menu.True, unless you define the toolLogic, which will run your specific tool logic and return an object to the conversation (e.g. for retrieved RAG context).Assistant messages are checked for safety and compliance before they are shown in the UI. The guardrail call n
$ claude mcp add openai-realtime-agents \
-- python -m otcore.mcp_server <graph>