Evaluate Xiaomi's speech synthesis technology before committing to paid plans
TTS model is available free across all tiers for a limited time, allowing users to test voice quality and integration before purchasing the full one-time license

Bilingual ASR for dialects, code-switching, and songs
MiMo-V2.5-ASR is an 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.
TTS model is available free across all tiers for a limited time, allowing users to test voice quality and integration before purchasing the full one-time license
MiMo-V2.5 Voice provides text-to-speech synthesis capabilities as part of the flagship model suite, enabling developers to integrate voice generation into their applications with a single one-time purchase
MiMo-V2.5 Voice is positioned as a flagship model offering advanced TTS capabilities suitable for professional use cases requiring high-quality voice output
One-time purchase unlocks both MiMo-V2.5 flagship models including voice synthesis, reducing subscription overhead and simplifying licensing for production deployments