Implement speech recognition, text-to-speech, and multilingual translation in a unified model
Voila is designed as a unified foundation model supporting ASR, TTS, and multilingual speech translation with minimal adaptation, reducing complexity of building multi-capability voice applications

