Voice Agent
Overview
Section titled “Overview”The Voice Agent (Bidi Agent) provides real-time bidirectional voice conversations. It streams audio with the browser via WebSocket using the Amazon Nova Sonic model.
Browser (Web Audio API) │ WebSocket ▼AgentCore WebSocket Proxy │ WebSocket ▼Voice Agent Container (FastAPI + Uvicorn) │ Strands BidiModel ▼Amazon Nova SonicVoice Model
Section titled “Voice Model”Uses Amazon Nova Sonic. It operates through the AWS internal network, enabling low-latency voice conversations without a separate API key.
| Item | Value |
|---|---|
| Model | Amazon Nova Sonic |
| API Key | Not required (IAM Role) |
| Latency | Low (AWS internal network) |
| Voices | tiffany, matthew |
WebSocket Events
Section titled “WebSocket Events”Browser → Server
Section titled “Browser → Server”| Event | Description |
|---|---|
audio | PCM audio (16kHz, 1 channel) |
text | Text input |
ping | Keep-alive (responds with pong) |
stop | End session |
Server → Browser
Section titled “Server → Browser”| Event | Description |
|---|---|
audio | Response audio (with sample rate) |
transcript | Text (with role, is_final) |
tool_use | Tool invocation notification |
tool_result | Tool execution result |
connection_start | Connection established |
response_start / response_complete | Response lifecycle |
interruption | User interrupted speech |
error | Error message |
timeout | Session timeout (default 900s) |
The Voice Agent can also use MCP tools.
| Tool | Description |
|---|---|
getDateAndTimeTool | Get current time in specified timezone |
DuckDuckGo search | Web search |
DuckDuckGo fetch_content | Fetch full web page content |
| AgentCore MCP tools | Document search, graph traversal, etc. |
Automatic Language Detection
Section titled “Automatic Language Detection”The preferred language is determined based on the browser’s timezone.
| Timezone | Language |
|---|---|
| Asia/Seoul | Korean |
| Asia/Tokyo | Japanese |
| Asia/Shanghai | Chinese |
| Europe/Paris | French |
| Europe/Berlin | German |
| America/Sao_Paulo | Portuguese |
| Other | English |
Transcript Persistence
Section titled “Transcript Persistence”Conversation transcripts are saved to S3.