Skip to main content

Whisper Speech

  • Developed by
    • Collabora and LAION
  • Model type
    • Multilingual Diffusion model for Speech Synthesis
  • Task
    • Text to Speech
  • Model description
    • An Open Source text-to-speech system built by inverting Whisper. Previously known as spear-tts-pytorch.
    • An easy way to test voice-cloning.
    • A Stable Diffusion fashion but for speech.
    • Built on top of powerful Open Source models: Whisper from OpenAI to generate semantic tokens and perform transcription, EnCodec from Meta for acoustic modeling and Vocos from Charactr Inc as the high-quality vocoder.