Change8

v4.49.0-SmolVLM-2

📦 transformersView on GitHub →
4 features🔧 4 symbols

Summary

This release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.

Migration Steps

  1. Install the specific release tag using: pip install git+https://github.com/huggingface/transformers@v4.49.0-SmolVLM-2

✨ New Features

  • Added support for SmolVLM-2 model family.
  • Support for multi-image and video inputs.
  • Integration of SmolLM2 as the text backbone for the vision-language model.
  • Customizable image resizing and patching via SmolVLMImageProcessor parameters (do_resize, size, max_image_size).

🔧 Affected Symbols

SmolVLM2SmolVLMImageProcessorAutoProcessorAutoModelForImageTextToText