v5.5.0
Breaking Changes📦 transformersView on GitHub →
⚠ 2 breaking✨ 4 features🐛 13 fixes🔧 14 symbols
Summary
This release introduces three major new models: Gemma 4 (multimodal), NomicBERT (long-context text embedding), and Music Flamingo (audio-language). It also includes significant breaking changes related to native cache handling for Mamba models and security updates for LightGlue.
⚠️ Breaking Changes
- Mamba and hybrid model caches must now use the new native cache classes; previous workarounds for Mamba-based or hybrid models are obsolete.
- Remote code execution support has been removed from the native LightGlue integration. Users loading LightGlue must remove the `trust_remote_code=True` argument and use the model via the standard native API.
Migration Steps
- Update code to use native cache classes for Mamba-based or hybrid models instead of previous workarounds.
- Remove `trust_remote_code=True` when loading LightGlue and use the model directly through the standard native API.
✨ New Features
- Added support for Gemma 4, a multimodal model available in 1B, 13B, and 27B parameters, featuring a vision processor that handles variable-sized images using a fixed token budget and spatial 2D RoPE.
- Introduced NomicBERT, a BERT-inspired encoder model using RoPE for reproducible long context text embeddings (8192 context length), outperforming Ada-002 and text-embedding-3-small on relevant benchmarks.
- Added Music Flamingo, a large audio–language model built on Audio Flamingo 3, incorporating Rotary Time Embeddings (RoTE) to handle audio sequences up to 20 minutes.
- Improved performance of repository checks (`check-repo`) by introducing file-level and AST-level disk caching, resulting in up to a 27x speedup with a warm cache.
🐛 Bug Fixes
- Corrected the Gemma vision mask to support video inputs.
- Resolved a dependency issue that incorrectly required torchvision for PIL-based image processors.
- Patched bugs in the Janus image generation model and image loading.
- Corrected local code resolution for tokenizers and image processors.
- Fixed resized LM head weights being overwritten by post_init.
- Added _tp_plan to ForConditionalGeneration for Qwen3.5 MoE.
- Fixed dtype mismatch in SwitchTransformers and TimmWrapperModel.
- Corrected type annotations across config classes for @strict validation.
- Fixed T5Attention shape mismatch under Tensor Parallelism.
- Re-added regex substitutions to the response parsing spec.
- Fixed incorrect TrainingArguments example in training.md.
- Fixed continuous batching JSON response serialization for serving.
- Fixed mlinter cache location in .gitignore.