An Open Source text-to-speech system built by inverting Whisper. Previously known as spear-tts-pytorch.
An easy way to test voice-cloning.
A Stable Diffusion fashion but for speech.
Built on top of powerful Open Source models: Whisper from OpenAI to generate semantic tokens and perform transcription, EnCodec from Meta for acoustic modeling and Vocos from Charactr Inc as the high-quality vocoder.