Change8

v5.9.0

Breaking Changes
📦 transformersView on GitHub →
1 breaking9 features🐛 18 fixes🔧 10 symbols

Summary

This release introduces several new models, including Cohere2Moe and HRM-Text, alongside significant improvements in audio processing and generation stability. Breaking changes affect input expectations for SAM3 family models.

⚠️ Breaking Changes

  • The `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs. Users must update their inputs to provide full text embeddings for these models.

Migration Steps

  1. Update inputs for SAM3, EdgeTAM, and SAM3-Lite-Text models to pass full text embeddings instead of just pooler outputs to the `text_embeds` argument.

✨ New Features

  • Added support for the Cohere Command A+ (cohere2_moe) Mixture-of-Experts model.
  • Added support for the Parakeet tdt model.
  • Added support for the HRM-Text model, featuring hierarchical recurrent forward pass and PrefixLM attention.
  • Expanded audio support with AudioFlamingoNext model checkpoints.
  • Improved compilability of audio/vision encoders via standalone pure functions.
  • Enhanced generation handling for Gemma4 regarding `inputs_embeds` and `per_layer_inputs`.
  • Enhanced `apply_chat_template` to support custom field prefilling (e.g., reasoning_content, thinking).
  • Added initial torch_tpu backend support.
  • Added tensor parallelism support ([CB] [Major] Add tensor paralellism).

🐛 Bug Fixes

  • Fixed memory leaks caused by lru decorators in vision models.
  • Improved error messaging when loading audio from video files.
  • Fixed generation issues related to `inputs_embeds` and `per_layer_inputs` handling for Gemma4.
  • Fixed `AttributeError` in RAG's `generate()` caused by missing config fields.
  • Blocked special image tokens during sampling in VLM generation tests to prevent flakiness.
  • Removed mask visualization tool from `masking_utils.py`.
  • Fixed `owned_by` field in GET /v1/models to return a list instead of a string.
  • Fixed remaining RAG doc examples that crash on current transformers.
  • Ensured initialization of the actual tensor, not a copy.
  • Fixed device mismatch for M-RoPE in Qwen3VL family under FSDP2 CPU offload.
  • Restored `_attn_implementation` and fixed request offset in `generate_batch()`.
  • Exposed `per_layer_inputs` for every Gemma4 variant.
  • Fixed Colqwen2 test.
  • Fixed undefined 'input' variable.
  • Fixed post processing for RF-DETR.
  • Fixed Hubert models that do not have `conv_pos_batch_norm` configured.
  • Reverted change 45777.
  • Required `input_ids` for repetition penalty calculation.

Affected Symbols