Back to products
Voxtral Transcribe 2 by Mistral

Voxtral Transcribe 2 by Mistral

Real-time speech-to-text with speaker diarization

Overview

What it is

- We’re committed to empower the AI community with open technology. Our open models sets the bar for efficiency, and are available for free, with fully permissive license. - Our optimized commercial models are designed for performance and are available via our flexible deployment options.

Intent

I need it when

Reduce costs for large-scale audio production compared to human voice talent or traditional TTS services

Voxtral TTS offers transparent per-character pricing ($0.016/1k characters) with no minimum commitments, allowing cost-effective scaling of voice generation for high-volume content production

Generate natural-sounding voice audio from written text for accessibility or content creation

Voxtral TTS converts text to high-quality speech with voice cloning capabilities, enabling users to create audio content, improve accessibility for visually impaired users, or produce multilingual voice-overs at scale via API integration

Clone or customize voices for branded audio content or personalized user experiences

Voxtral TTS includes voice cloning functionality, enabling organizations to create consistent branded voices or personalized audio experiences across multiple applications and touchpoints

Automate voice generation for customer service, IVR systems, or interactive applications

Voxtral TTS integrates into production workflows through Mistral's API, allowing developers to programmatically generate voices for chatbots, automated announcements, and interactive systems with per-character billing

Drop

Not a fit when

  • User needs speech-to-text transcription rather than text-to-speech generation, as Voxtral TTS is explicitly a text-to-speech model
  • User requires on-device processing without API calls, since Voxtral is accessed via Mistral's cloud API
  • User needs real-time transcription of live audio streams, as the product is designed for batch text-to-speech conversion
  • User operates in a region with restricted access to Mistral's EU-hosted infrastructure
  • User requires guaranteed SLA and priority support without an Enterprise plan contract
Commercials

Pricing

Pay-per-use API pricing: $0.016 per 1,000 characters for text-to-speech generation View pricing