v5.5.0

Breaking Changes

📅 Apr 2, 2026📦 transformersView on GitHub →

⚠ 2 breaking✨ 4 features🐛 13 fixes🔧 14 symbols

Summary

This release introduces three major new models: Gemma 4 (multimodal), NomicBERT (long-context text embedding), and Music Flamingo (audio-language). It also includes significant breaking changes related to native cache handling for Mamba models and security updates for LightGlue.

⚠️ Breaking Changes

Mamba and hybrid model caches must now use the new native cache classes; previous workarounds for Mamba-based or hybrid models are obsolete.
Remote code execution support has been removed from the native LightGlue integration. Users loading LightGlue must remove the `trust_remote_code=True` argument and use the model via the standard native API.

Migration Steps

Update code to use native cache classes for Mamba-based or hybrid models instead of previous workarounds.
Remove `trust_remote_code=True` when loading LightGlue and use the model directly through the standard native API.

✨ New Features

Added support for Gemma 4, a multimodal model available in 1B, 13B, and 27B parameters, featuring a vision processor that handles variable-sized images using a fixed token budget and spatial 2D RoPE.
Introduced NomicBERT, a BERT-inspired encoder model using RoPE for reproducible long context text embeddings (8192 context length), outperforming Ada-002 and text-embedding-3-small on relevant benchmarks.
Added Music Flamingo, a large audio–language model built on Audio Flamingo 3, incorporating Rotary Time Embeddings (RoTE) to handle audio sequences up to 20 minutes.
Improved performance of repository checks (`check-repo`) by introducing file-level and AST-level disk caching, resulting in up to a 27x speedup with a warm cache.

🐛 Bug Fixes

Corrected the Gemma vision mask to support video inputs.
Resolved a dependency issue that incorrectly required torchvision for PIL-based image processors.
Patched bugs in the Janus image generation model and image loading.
Corrected local code resolution for tokenizers and image processors.
Fixed resized LM head weights being overwritten by post_init.
Added _tp_plan to ForConditionalGeneration for Qwen3.5 MoE.
Fixed dtype mismatch in SwitchTransformers and TimmWrapperModel.
Corrected type annotations across config classes for @strict validation.
Fixed T5Attention shape mismatch under Tensor Parallelism.
Re-added regex substitutions to the response parsing spec.
Fixed incorrect TrainingArguments example in training.md.
Fixed continuous batching JSON response serialization for serving.
Fixed mlinter cache location in .gitignore.

Affected Symbols

Gemma 4 (New Model)NomicBERT (New Model)Music Flamingo (New Model)LightGlue Gemma vision mask Janus image generation Tokenizers Image Processors LM head weights Qwen3.5 MoE SwitchTransformers TimmWrapperModel T5Attention AutoConfig.from_pretrained