Change8

v4.52.4-Kyutai-STT-preview

📦 transformersView on GitHub →
3 features🔧 2 symbols

Summary

This release introduces a preview of the Kyutai-STT model architecture, featuring 1B and 2.6B parameter checkpoints for high-accuracy speech-to-text transcription.

Migration Steps

  1. Install the preview version using: pip install git+https://github.com/huggingface/transformers@v4.52.4-Kyutai-STT-preview

✨ New Features

  • Added Kyutai-STT model architecture, a speech-to-text model based on the Mimi codec and a Moshi-like autoregressive decoder.
  • Support for kyutai/stt-1b-en_fr (1B parameters, English and French transcription).
  • Support for kyutai/stt-2.6b-en (2.6B parameters, English transcription).

🔧 Affected Symbols

KyutaiSpeechToTextProcessorKyutaiSpeechToTextForConditionalGeneration