Google Gemini 3.1 Flash TTS

Text-to-speech API with natural language voice direction

Website blog.google

What it is

Google's TTS API with inline audio tags, multi-speaker dialogue, and 70+ language support. For developers building voice agents, dubbing tools, or AI content products via the Gemini API and Vertex AI.

Intent

I need it when

Build multilingual voice applications that serve global audiences with localized, expressive speech

The model supports 70+ languages with native multi-speaker dialogue capabilities and advanced style, pacing, and accent control optimized for major markets. Developers can create localized, expressive speech experiences at global scale without building separate models per language.

Create expressive, natural-sounding AI-generated speech with fine-grained control over vocal style and delivery

Gemini 3.1 Flash TTS delivers improved speech quality with an Elo score of 1,211 on the Artificial Analysis TTS leaderboard. Audio tags enable precise control over vocal style, pace, tone, and accent using natural language commands, allowing developers to direct AI speech output with granular expressivity for character-driven applications.

Generate AI audio content while maintaining transparency and preventing misinformation

All audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID, an imperceptible watermark interwoven into the audio output that enables reliable detection of AI-generated content to help prevent misinformation.

Integrate consistent, recognizable AI voices across multiple projects and platforms programmatically

Developers can configure voices in Google AI Studio with scene direction, speaker-level specificity, and inline tags, then export exact parameters as Gemini API code. This ensures consistent, recognizable voices across various projects and platforms with reproducible settings.

Access a high-quality, cost-effective text-to-speech API for production applications

Gemini 3.1 Flash TTS is positioned in Artificial Analysis's 'most attractive quadrant' for its ideal blend of high-quality speech generation and low cost. Available via Gemini API, Google AI Studio, Vertex AI for enterprises, and Google Vids for Workspace users.

Drop

Not a fit when

User requires offline text-to-speech without internet connectivity or API calls
User needs guaranteed pricing transparency and cost predictability before implementation
User requires support for languages outside the 70+ supported languages
User cannot accept AI-generated audio watermarking or SynthID detection markers
User needs real-time speech generation with sub-100ms latency requirements

Commercials

Pricing

Pricing not specified