v5.2.0

Breaking Changes

📅 Feb 16, 2026📦 transformersView on GitHub →

⚠ 2 breaking✨ 4 features🐛 18 fixes🔧 17 symbols

Summary

This release introduces several major new models including VoxtralRealtime, GLM-5, and Qwen3.5, alongside significant internal refactoring, particularly around attention mechanisms and trainer stability.

⚠️ Breaking Changes

The attention mask interface has been updated everywhere, requiring review if custom attention logic was used.
ModernBERT's default attention implementation no longer uses FA (Flash Attention), which might affect performance or behavior if FA was implicitly relied upon.

Migration Steps

Review and update any custom attention logic due to the new attn mask interface.
If relying on ModernBERT's default attention implementation, be aware it no longer uses FA.

✨ New Features

Introduction of VoxtralRealtime, a streaming speech-to-text model from Mistral AI for low-latency, incremental ASR.
Addition of GLM-5 (GlmMoeDsa), featuring 744B parameters (40B active) and integrating DeepSeek Sparse Attention (DSA) for improved efficiency on long-context tasks.
Introduction of Qwen3.5 and Qwen3.5 Moe (specifically Qwen3.5-397B-A17B), a native vision-language model with a hybrid architecture (linear attention via Gated Delta Networks + sparse MoE) achieving high efficiency and expanded language support to 201 dialects.
Addition of VibeVoice Acoustic Tokenizer framework for synthesizing high-fidelity, long-form speech using a next-token diffusion approach.

🐛 Bug Fixes

Fixed `convert_rope_params_to_dict` to correctly use `rope_theta` from the configuration.
Fixed Qwen RMS norms in modular dependencies.
Fixed BLOOM tokenizer, CLAP audio features, and CLVP text tester usage in tests.
Prevented AutoTokenizer type mismatch caused by directory name substring matching.
Fixed DeepSpeed model preparation logic within the Trainer class.
Fixed incorrect timestamp calculation in Qwen3VL Processor.
Fixed gptoss crash when using tensor parallelism (tp).
Removed `batch_split` from EncoderDecoderCache.
Removed unnecessary code to make MoE compatible with full graph compile.
Updated ModelType for Unigram tokenizer.
Fixed video interpolation in `pe_audio_video`.
Fixed looking for `pad_token_id` in the wrong place for Llama4.
Fixed cardinality error for DETR models lacking an explicit background class.
Fixed bugs in xLSTM preventing small model training.
Fixed GlmMoeDsaConfig default `mlp_layer_types` during modular conversion.
Fixed loading procedure in `MistralCommonBackend`.
Added fallback to slow path and warning instead of erroring out in Jamba implementation.
Fixed SwanLab callback to forward resume initialization arguments.

Affected Symbols

Attn ModernBERT convert_rope_params_to_dict AutoTokenizer Trainer DeepSpeed Qwen3VL Processor gptoss EncoderDecoderCache pe_audio_video Llama4 DETR models xLSTM GlmMoeDsaConfig MistralCommonBackend Jamba SwanLab callback