v5.6.0

Breaking Changes

📅 Apr 22, 2026📦 transformersView on GitHub →

⚠ 1 breaking✨ 9 features🐛 12 fixes🔧 12 symbols

Summary

Version 5.6.0 introduces four new models: OpenAI Privacy Filter, Qianfan-OCR, SAM3-LiteText, and SLANet. This release also significantly enhances `transformers serve` with new endpoints and multimodal support, alongside several critical fixes in vision, parallelization, and tokenization.

⚠️ Breaking Changes

The internal `rotary_fn` is no longer registered as a hidden kernel function. Code referencing `self.rotary_fn(...)` within an Attention module must be updated to call the function directly instead.

Migration Steps

Update any code referencing `self.rotary_fn(...)` within an Attention module to call the function directly instead of through `self.rotary_fn`.

✨ New Features

Added OpenAI Privacy Filter, a bidirectional token-classification model for PII detection and masking.
Added Qianfan-OCR, a 4B-parameter end-to-end document intelligence model supporting prompt-driven tasks like structured parsing and table extraction.
Added SAM3-LiteText, a lightweight variant of SAM3 using a MobileCLIP-based text encoder for efficient vision-language segmentation.
Added SLANet and SLANet_plus models for accurate table structure recognition.
The `transformers serve` command now includes a new `/v1/completions` endpoint for legacy text completion.
Added multimodal support (audio and video inputs) to `transformers serve`.
Improved tool-calling support in `transformers serve` via `parse_response` and proper forwarding of `tool_calls`/`tool_call_id` fields.
Added support for loading adapters with Tensor Parallelism.
Added MoE configuration to the Gemma4 Tensor Parallelism plan.

🐛 Bug Fixes

Fixed Qwen2.5-VL temporal RoPE scaling being incorrectly applied to still images.
Fixed missing or mismatched image processor backends for Emu3 and BLIP models.
Resolved issues related to modular image processor class duplication.
Prevented accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models.
Improved image loading performance by leveraging torchvision's native `decode_image` in the torchvision backend (up to ~17% speedup).
Fixed Expert Parallelism issues: RouterParallel shape, `tp_plan` property, and grouped_mm sentinels.
Fixed NaN weights appearing on non-rank-0 FSDP processes.
Fixed a resize failure in PP-DocLayoutV3 caused by zero-sized masks.
Fixed a docstring typo in streamer classes ('tokenized' -> 'tokenizer').
Resolved a Kimi-K2.5 tokenizer regression and `_patch_mistral_regex` AttributeError.
Patched a streaming generation crash for `Qwen3VLProcessor` caused by incorrect `_tokenizer` attribute access.
Fixed a global state leak in the tokenizer registry during tests.

Affected Symbols

Attention module (internal rotary_fn)Qwen2.5-VL Emu3 image processor BLIP image processor PeVideo model encoder PeAudioVideo model encoder RouterParallel (Expert Parallelism)FSDP processes PP-DocLayoutV3 streamer classes Kimi-K2.5 tokenizer Qwen3VLProcessor