Change8

v5.5.0

Breaking Changes
📦 transformersView on GitHub →
2 breaking4 features🐛 13 fixes🔧 14 symbols

Summary

This release introduces three major new models: Gemma 4 (multimodal), NomicBERT (long-context text embedding), and Music Flamingo (audio-language). It also includes significant breaking changes related to native cache handling for Mamba models and security updates for LightGlue.

⚠️ Breaking Changes

  • Mamba and hybrid model caches must now use the new native cache classes; previous workarounds for Mamba-based or hybrid models are obsolete.
  • Remote code execution support has been removed from the native LightGlue integration. Users loading LightGlue must remove the `trust_remote_code=True` argument and use the model via the standard native API.

Migration Steps

  1. Update code to use native cache classes for Mamba-based or hybrid models instead of previous workarounds.
  2. Remove `trust_remote_code=True` when loading LightGlue and use the model directly through the standard native API.

✨ New Features

  • Added support for Gemma 4, a multimodal model available in 1B, 13B, and 27B parameters, featuring a vision processor that handles variable-sized images using a fixed token budget and spatial 2D RoPE.
  • Introduced NomicBERT, a BERT-inspired encoder model using RoPE for reproducible long context text embeddings (8192 context length), outperforming Ada-002 and text-embedding-3-small on relevant benchmarks.
  • Added Music Flamingo, a large audio–language model built on Audio Flamingo 3, incorporating Rotary Time Embeddings (RoTE) to handle audio sequences up to 20 minutes.
  • Improved performance of repository checks (`check-repo`) by introducing file-level and AST-level disk caching, resulting in up to a 27x speedup with a warm cache.

🐛 Bug Fixes

  • Corrected the Gemma vision mask to support video inputs.
  • Resolved a dependency issue that incorrectly required torchvision for PIL-based image processors.
  • Patched bugs in the Janus image generation model and image loading.
  • Corrected local code resolution for tokenizers and image processors.
  • Fixed resized LM head weights being overwritten by post_init.
  • Added _tp_plan to ForConditionalGeneration for Qwen3.5 MoE.
  • Fixed dtype mismatch in SwitchTransformers and TimmWrapperModel.
  • Corrected type annotations across config classes for @strict validation.
  • Fixed T5Attention shape mismatch under Tensor Parallelism.
  • Re-added regex substitutions to the response parsing spec.
  • Fixed incorrect TrainingArguments example in training.md.
  • Fixed continuous batching JSON response serialization for serving.
  • Fixed mlinter cache location in .gitignore.

Affected Symbols