v4.57.0

📅 Oct 3, 2025📦 transformersView on GitHub →

✨ 6 features🔧 7 symbols

Summary

This release introduces support for several next-generation model architectures, including the high-efficiency Qwen3-Next and Qwen3-VL series, the privacy-focused VaultGemma, and the high-speed Longcat Flash MoE.

Migration Steps

1. Review the release notes for newly added models: Qwen3-Next, Vault Gemma, Qwen3 VL, Longcat Flash, and Flex Olmo.
2. If you intend to use any of these new models, update your dependency to include the necessary library versions that support them.
3. If you were using older Qwen models, note the introduction of Qwen3-Next, which features architectural changes like Hybrid Attention and High-Sparsity MoE. While direct migration steps aren't specified, be aware of potential performance differences or configuration changes if switching to Qwen3-Next.
4. If you are using Gemma models, be aware of the new Vault Gemma variant, which notably drops norms after Attention and MLP blocks. If you were relying on those norms for stability or specific behavior, you may need to adjust your configuration when switching to Vault Gemma.
5. For users interested in multimodal capabilities, begin exploring the Qwen3-VL series, noting its enhanced MRope and DeepStack integration.
6. For long-context tasks requiring high throughput, consider integrating Longcat Flash, paying attention to its unique shortcut-connected MoE architecture and dynamic parameter activation.

✨ New Features

Added support for Qwen3-Next series featuring Hybrid Attention (Gated DeltaNet + Gated Attention) and High-Sparsity MoE.
Added support for VaultGemma, a text-only decoder model with sequence-level differential privacy (DP-SGD).
Added support for Qwen3-VL multimodal vision-language series with MRope and DeepStack integration.
Added support for Longcat Flash, a 560B MoE model with shortcut-connected architecture for high inference speed.
Added support for FlexOlmo, a mixture-of-experts architecture supporting distributed training without data sharing.
Added support for LFM2-VL (1.6B and 450M variants), low-latency vision-language models using SigLIP2 NaFlex encoders.

🔧 Affected Symbols

Qwen3-NextVaultGemmaQwen3-VLLongCat-FlashFlexOlmoLFM2-VLSigLIP2