v4.51.3-InternVL-preview

📅 Apr 22, 2025📦 transformersView on GitHub →

✨ 5 features🔧 4 symbols

Summary

This preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.

Migration Steps

Install the preview version using: pip install git+https://github.com/huggingface/transformers@v4.51.3-InternVL-preview
Use processor.apply_chat_template() to correctly format prompts for InternVL chat models.

✨ New Features

Added support for InternVL (2.5 & 3) Visual Language Models.
Support for InternVL3 native multimodal pre-training paradigm and variable visual position encoding (V2PE).
Support for image-text-to-text pipeline inference.
Support for batched image and text inputs.
Support for text-only generation using multimodal checkpoints.

Affected Symbols

InternVL AutoProcessor AutoModelForImageTextToText image-text-to-text