v4.49.0-Gemma-3

📅 Mar 18, 2025📦 transformersView on GitHub →

✨ 6 features🔧 4 symbols

Summary

This release introduces Google's Gemma 3 multimodal models to the transformers library, featuring a SigLIP vision encoder and Gemma 2 language decoder with support for high-resolution image cropping and multi-image inference.

Migration Steps

Install the specific release tag using: pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3

✨ New Features

Added support for Gemma 3, a multimodal vision-language model.
Introduced Gemma3ForConditionalGeneration for image+text and image-only inputs.
Introduced Gemma3ForCausalLM for optimized text-only generation.
Support for multi-image inputs within a single sample.
Added 'pan and scan' image cropping (do_pan_and_scan=True) to improve resolution for tasks like DocVQA and OCR.
Processor support for apply_chat_template with multimodal input handling.

Affected Symbols

Gemma3ForConditionalGeneration Gemma3ForCausalLM Gemma3Processor AutoProcessor