Whisper Speech

Developed by
- Collabora and LAION
Model type
- Multilingual Diffusion model for Speech Synthesis
Task
- Text to Speech
Model description
- An Open Source text-to-speech system built by inverting Whisper. Previously known as spear-tts-pytorch.
- An easy way to test voice-cloning.
- A Stable Diffusion fashion but for speech.
- Built on top of powerful Open Source models: Whisper from OpenAI to generate semantic tokens and perform transcription, EnCodec from Meta for acoustic modeling and Vocos from Charactr Inc as the high-quality vocoder.