v4.50.0
📦 transformersView on GitHub →
✨ 7 features🐛 1 fixes🔧 8 symbols
Summary
Release v4.50.0 introduces a new model-based release strategy and adds support for several major vision-language models including Gemma 3, Aya Vision, Mistral 3.1, and SigLIP-2.
Migration Steps
- Review the new model-based release mechanism. Note that model releases (e.g., v4.49.0-Gemma-3) are now tagged on GitHub and not pushed to PyPI.
- If you intend to use the newly released models (Gemma 3, Shield Gemma 2, Aya Vision, Mistral 3.1), ensure you are referencing the correct model tag when loading weights, as these tags might be updated to include bug fixes.
- If you were using older versions of Gemma or related models, check the documentation linked for Gemma 3 and Shield Gemma 2 for any specific configuration changes related to vision encoding or safety filtering.
- If integrating Aya Vision, be aware of its multilingual capabilities (23 languages) and the underlying components (Siglip2-so400-384-14 vision encoder and CommandR-based language models).
✨ New Features
- Introduction of model-based releases (tags) for faster model availability between software releases.
- Added Gemma 3: A vision-language model using SigLIP vision encoder and Gemma 2 language decoder with bidirectional attention.
- Added ShieldGemma 2: A 4B parameter safety model for filtering synthetic and natural images.
- Added Aya Vision (8B and 32B): Multilingual multimodal models supporting 23 languages.
- Added Mistral 3.1: A 24B parameter model with vision understanding and 128k context window.
- Added SmolVLM-2: A lightweight VLM supporting multi-image and video inputs based on SmolLM2.
- Added SigLIP-2: Improved vision-language encoders with support for NaFlex (variable aspect ratios).
🐛 Bug Fixes
- Model-specific fixes are now merged into dedicated release tags (e.g., v4.49.0-Gemma-3) to ensure the best experience without waiting for monthly software releases.
🔧 Affected Symbols
Gemma3ShieldGemma2AyaVisionMistral3SmolVLM2SigLIP2SigLIPGemma2