v5.7.0
📦 transformersView on GitHub →
✨ 2 features🐛 19 fixes🔧 14 symbols
Summary
This release introduces two major new model families, Laguna and DEIMv2, alongside numerous fixes for attention mechanisms, continuous batching generation, and kernel loading across various models.
Migration Steps
- If you were experiencing issues with AutoTokenizer initializing the wrong class (e.g., for DeepSeek R1), this specific bug has been reverted and should resolve the issue.
✨ New Features
- Added support for the Laguna mixture-of-experts language model family, featuring per-layer head counts and a sigmoid MoE router.
- Introduced DEIMv2 (DETR with Improved Matching v2), a real-time object detection model spanning eight sizes, utilizing DINOv3 features and Spatial Tuning Adapters (STA) for larger variants.
🐛 Bug Fixes
- Fixed cross-attention cache layer type error for T5Gemma2 when processing long inputs.
- Corrected incorrect cached forward behavior in Qwen3.5's gated-delta-net linear attention for multi-token sequences.
- Resolved a crash in GraniteMoeHybrid when running attention-only models due to an issue with _update_mamba_mask.
- Updated attention function dispatch to align with the latest model implementations.
- Reverted a change that caused AutoTokenizer to initialize the wrong tokenizer class, fixing regressions in models like DeepSeek R1.
- Corrected KV deduplication and memory estimation for long sequences (16K+) during continuous batching generation.
- Removed misleading warnings about unsupported features like `num_return_sequences` that incorrectly fired during continuous batching.
- Fixed configuration reading and error handling for kernels, specifically for FP8 checkpoints (e.g., Qwen3.5-35B-A3B-FP8).
- Enabled custom expert kernels registered from the Hugging Face Hub to be properly loaded.
- Resolved an incompatibility preventing Gemma3n and Gemma4 from using the rotary kernel.
- Fixed a `ValueError` in `zero_shot_object_detection` compatible with Python 3.13.
- Fixed Pageable H2D copies in the Gated DeltaNet PyTorch fallback.
- Fixed `UnboundLocalError` in `shard_and_distribute_module` when dealing with replicated parameters.
- Fixed `NameError: PeftConfigLike` triggered by `PreTrainedModel.__init_subclass__`.
- Skipped `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast`.
- Fixed an issue where the whisper model returned the wrong language.
- Fixed computation of auxiliary losses when denoising is disabled in D-FINE.
- Fixed `AttributeError` on `s_aux=None` in `flash_attention_forward`.
- Prevented indexing past decoded characters when dealing with special tokens during decoding.