Change8

v5.6.0

Breaking Changes
📦 transformersView on GitHub →
1 breaking9 features🐛 12 fixes🔧 12 symbols

Summary

Version 5.6.0 introduces four new models: OpenAI Privacy Filter, Qianfan-OCR, SAM3-LiteText, and SLANet. This release also significantly enhances `transformers serve` with new endpoints and multimodal support, alongside several critical fixes in vision, parallelization, and tokenization.

⚠️ Breaking Changes

  • The internal `rotary_fn` is no longer registered as a hidden kernel function. Code referencing `self.rotary_fn(...)` within an Attention module must be updated to call the function directly instead.

Migration Steps

  1. Update any code referencing `self.rotary_fn(...)` within an Attention module to call the function directly instead of through `self.rotary_fn`.

✨ New Features

  • Added OpenAI Privacy Filter, a bidirectional token-classification model for PII detection and masking.
  • Added Qianfan-OCR, a 4B-parameter end-to-end document intelligence model supporting prompt-driven tasks like structured parsing and table extraction.
  • Added SAM3-LiteText, a lightweight variant of SAM3 using a MobileCLIP-based text encoder for efficient vision-language segmentation.
  • Added SLANet and SLANet_plus models for accurate table structure recognition.
  • The `transformers serve` command now includes a new `/v1/completions` endpoint for legacy text completion.
  • Added multimodal support (audio and video inputs) to `transformers serve`.
  • Improved tool-calling support in `transformers serve` via `parse_response` and proper forwarding of `tool_calls`/`tool_call_id` fields.
  • Added support for loading adapters with Tensor Parallelism.
  • Added MoE configuration to the Gemma4 Tensor Parallelism plan.

🐛 Bug Fixes

  • Fixed Qwen2.5-VL temporal RoPE scaling being incorrectly applied to still images.
  • Fixed missing or mismatched image processor backends for Emu3 and BLIP models.
  • Resolved issues related to modular image processor class duplication.
  • Prevented accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models.
  • Improved image loading performance by leveraging torchvision's native `decode_image` in the torchvision backend (up to ~17% speedup).
  • Fixed Expert Parallelism issues: RouterParallel shape, `tp_plan` property, and grouped_mm sentinels.
  • Fixed NaN weights appearing on non-rank-0 FSDP processes.
  • Fixed a resize failure in PP-DocLayoutV3 caused by zero-sized masks.
  • Fixed a docstring typo in streamer classes ('tokenized' -> 'tokenizer').
  • Resolved a Kimi-K2.5 tokenizer regression and `_patch_mistral_regex` AttributeError.
  • Patched a streaming generation crash for `Qwen3VLProcessor` caused by incorrect `_tokenizer` attribute access.
  • Fixed a global state leak in the tokenizer registry during tests.

Affected Symbols