v4.49.0-SmolVLM-2

📅 Feb 20, 2025📦 transformersView on GitHub →

✨ 4 features🔧 4 symbols

Summary

This release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.

Migration Steps

Install the specific release tag using: pip install git+https://github.com/huggingface/transformers@v4.49.0-SmolVLM-2

✨ New Features

Added support for SmolVLM-2 model family.
Support for multi-image and video inputs.
Integration of SmolLM2 as the text backbone for the vision-language model.
Customizable image resizing and patching via SmolVLMImageProcessor parameters (do_resize, size, max_image_size).

Affected Symbols

SmolVLM2 SmolVLMImageProcessor AutoProcessor AutoModelForImageTextToText