v4.49.0-SmolVLM-2
📦 transformersView on GitHub →
✨ 4 features🔧 4 symbols
Summary
This release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.
Migration Steps
- Install the specific release tag using: pip install git+https://github.com/huggingface/transformers@v4.49.0-SmolVLM-2
✨ New Features
- Added support for SmolVLM-2 model family.
- Support for multi-image and video inputs.
- Integration of SmolLM2 as the text backbone for the vision-language model.
- Customizable image resizing and patching via SmolVLMImageProcessor parameters (do_resize, size, max_image_size).
🔧 Affected Symbols
SmolVLM2SmolVLMImageProcessorAutoProcessorAutoModelForImageTextToText