v5.9.0
Breaking Changes📦 transformersView on GitHub →
⚠ 1 breaking✨ 9 features🐛 18 fixes🔧 10 symbols
Summary
This release introduces several new models, including Cohere2Moe and HRM-Text, alongside significant improvements in audio processing and generation stability. Breaking changes affect input expectations for SAM3 family models.
⚠️ Breaking Changes
- The `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs. Users must update their inputs to provide full text embeddings for these models.
Migration Steps
- Update inputs for SAM3, EdgeTAM, and SAM3-Lite-Text models to pass full text embeddings instead of just pooler outputs to the `text_embeds` argument.
✨ New Features
- Added support for the Cohere Command A+ (cohere2_moe) Mixture-of-Experts model.
- Added support for the Parakeet tdt model.
- Added support for the HRM-Text model, featuring hierarchical recurrent forward pass and PrefixLM attention.
- Expanded audio support with AudioFlamingoNext model checkpoints.
- Improved compilability of audio/vision encoders via standalone pure functions.
- Enhanced generation handling for Gemma4 regarding `inputs_embeds` and `per_layer_inputs`.
- Enhanced `apply_chat_template` to support custom field prefilling (e.g., reasoning_content, thinking).
- Added initial torch_tpu backend support.
- Added tensor parallelism support ([CB] [Major] Add tensor paralellism).
🐛 Bug Fixes
- Fixed memory leaks caused by lru decorators in vision models.
- Improved error messaging when loading audio from video files.
- Fixed generation issues related to `inputs_embeds` and `per_layer_inputs` handling for Gemma4.
- Fixed `AttributeError` in RAG's `generate()` caused by missing config fields.
- Blocked special image tokens during sampling in VLM generation tests to prevent flakiness.
- Removed mask visualization tool from `masking_utils.py`.
- Fixed `owned_by` field in GET /v1/models to return a list instead of a string.
- Fixed remaining RAG doc examples that crash on current transformers.
- Ensured initialization of the actual tensor, not a copy.
- Fixed device mismatch for M-RoPE in Qwen3VL family under FSDP2 CPU offload.
- Restored `_attn_implementation` and fixed request offset in `generate_batch()`.
- Exposed `per_layer_inputs` for every Gemma4 variant.
- Fixed Colqwen2 test.
- Fixed undefined 'input' variable.
- Fixed post processing for RF-DETR.
- Fixed Hubert models that do not have `conv_pos_batch_norm` configured.
- Reverted change 45777.
- Required `input_ids` for repetition penalty calculation.