Change8

v5.7.0

📦 transformersView on GitHub →
2 features🐛 19 fixes🔧 14 symbols

Summary

This release introduces two major new model families, Laguna and DEIMv2, alongside numerous fixes for attention mechanisms, continuous batching generation, and kernel loading across various models.

Migration Steps

  1. If you were experiencing issues with AutoTokenizer initializing the wrong class (e.g., for DeepSeek R1), this specific bug has been reverted and should resolve the issue.

✨ New Features

  • Added support for the Laguna mixture-of-experts language model family, featuring per-layer head counts and a sigmoid MoE router.
  • Introduced DEIMv2 (DETR with Improved Matching v2), a real-time object detection model spanning eight sizes, utilizing DINOv3 features and Spatial Tuning Adapters (STA) for larger variants.

🐛 Bug Fixes

  • Fixed cross-attention cache layer type error for T5Gemma2 when processing long inputs.
  • Corrected incorrect cached forward behavior in Qwen3.5's gated-delta-net linear attention for multi-token sequences.
  • Resolved a crash in GraniteMoeHybrid when running attention-only models due to an issue with _update_mamba_mask.
  • Updated attention function dispatch to align with the latest model implementations.
  • Reverted a change that caused AutoTokenizer to initialize the wrong tokenizer class, fixing regressions in models like DeepSeek R1.
  • Corrected KV deduplication and memory estimation for long sequences (16K+) during continuous batching generation.
  • Removed misleading warnings about unsupported features like `num_return_sequences` that incorrectly fired during continuous batching.
  • Fixed configuration reading and error handling for kernels, specifically for FP8 checkpoints (e.g., Qwen3.5-35B-A3B-FP8).
  • Enabled custom expert kernels registered from the Hugging Face Hub to be properly loaded.
  • Resolved an incompatibility preventing Gemma3n and Gemma4 from using the rotary kernel.
  • Fixed a `ValueError` in `zero_shot_object_detection` compatible with Python 3.13.
  • Fixed Pageable H2D copies in the Gated DeltaNet PyTorch fallback.
  • Fixed `UnboundLocalError` in `shard_and_distribute_module` when dealing with replicated parameters.
  • Fixed `NameError: PeftConfigLike` triggered by `PreTrainedModel.__init_subclass__`.
  • Skipped `clean_up_tokenization` for BPE tokenizers in `PreTrainedTokenizerFast`.
  • Fixed an issue where the whisper model returned the wrong language.
  • Fixed computation of auxiliary losses when denoising is disabled in D-FINE.
  • Fixed `AttributeError` on `s_aux=None` in `flash_attention_forward`.
  • Prevented indexing past decoded characters when dealing with special tokens during decoding.

Affected Symbols