Back to products
Amazon Nova Sonic

Amazon Nova Sonic

AI That Hears How You Speak

Website aws.amazon.com
Overview

What it is

Nova Sonic is Amazon's Speech-to-speech AI on Bedrock. Understands how you speak (tone, pace) & responds with adaptive, expressive voice in real-time.

Intent

I need it when

Deploy voice assistants with extended context awareness

Nova Sonic offers up to 1M token context window and efficient dialog handling, enabling voice assistants to maintain conversation history, follow complex instructions, and invoke tools accurately across long interactions for personal assistants and interactive learning applications.

Create multilingual voice-enabled applications

Nova Sonic supports seven languages with polyglot voice capabilities and cross-modal interaction (seamless switching between voice and text), allowing developers to build globally accessible voice applications without managing multiple specialized models.

Reduce infrastructure costs for speech-based AI

Nova Sonic delivers industry-leading price-performance by combining speech-to-speech in a single unified model rather than chaining separate ASR and TTS systems, reducing operational complexity and inference costs for production voice applications.

Build real-time conversational AI for customer support

Amazon Nova Sonic unifies speech understanding and generation in a single model, enabling natural real-time voice interactions with industry-leading latency and pricing. Supports streaming speech input, tool invocation for task completion, and asynchronous task handling to power responsive customer support agents.

Drop

Not a fit when

  • User requires on-premises deployment without cloud infrastructure
  • Application needs support for languages beyond the seven currently supported
  • User requires guaranteed sub-100ms latency for all use cases
  • Organization cannot use AWS Bedrock or requires vendor-agnostic deployment
  • Use case requires offline-first or disconnected operation
Commercials

Pricing

Pay-as-you-go per audio minute for speech input and output; text token pricing for transcription, tool calls, and conversation history View pricing