Skip to content

Voice Agent

The Voice Agent (Bidi Agent) provides real-time bidirectional voice conversations. It streams audio with the browser via WebSocket using the Amazon Nova Sonic model.

Browser (Web Audio API)
│ WebSocket
AgentCore WebSocket Proxy
│ WebSocket
Voice Agent Container (FastAPI + Uvicorn)
│ Strands BidiModel
Amazon Nova Sonic

Uses Amazon Nova Sonic. It operates through the AWS internal network, enabling low-latency voice conversations without a separate API key.

ItemValue
ModelAmazon Nova Sonic
API KeyNot required (IAM Role)
LatencyLow (AWS internal network)
Voicestiffany, matthew

EventDescription
audioPCM audio (16kHz, 1 channel)
textText input
pingKeep-alive (responds with pong)
stopEnd session
EventDescription
audioResponse audio (with sample rate)
transcriptText (with role, is_final)
tool_useTool invocation notification
tool_resultTool execution result
connection_startConnection established
response_start / response_completeResponse lifecycle
interruptionUser interrupted speech
errorError message
timeoutSession timeout (default 900s)

The Voice Agent can also use MCP tools.

ToolDescription
getDateAndTimeToolGet current time in specified timezone
DuckDuckGo searchWeb search
DuckDuckGo fetch_contentFetch full web page content
AgentCore MCP toolsDocument search, graph traversal, etc.

The preferred language is determined based on the browser’s timezone.

TimezoneLanguage
Asia/SeoulKorean
Asia/TokyoJapanese
Asia/ShanghaiChinese
Europe/ParisFrench
Europe/BerlinGerman
America/Sao_PauloPortuguese
OtherEnglish

Conversation transcripts are saved to S3.