Change8

v4.51.3-InternVL-preview

📦 transformersView on GitHub →
5 features🔧 4 symbols

Summary

This preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.

Migration Steps

  1. Install the preview version using: pip install git+https://github.com/huggingface/transformers@v4.51.3-InternVL-preview
  2. Use processor.apply_chat_template() to correctly format prompts for InternVL chat models.

✨ New Features

  • Added support for InternVL (2.5 & 3) Visual Language Models.
  • Support for InternVL3 native multimodal pre-training paradigm and variable visual position encoding (V2PE).
  • Support for image-text-to-text pipeline inference.
  • Support for batched image and text inputs.
  • Support for text-only generation using multimodal checkpoints.

Affected Symbols