v4.51.3-InternVL-preview
📦 transformersView on GitHub →
✨ 5 features🔧 4 symbols
Summary
This preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.
Migration Steps
- Install the preview version using: pip install git+https://github.com/huggingface/transformers@v4.51.3-InternVL-preview
- Use processor.apply_chat_template() to correctly format prompts for InternVL chat models.
✨ New Features
- Added support for InternVL (2.5 & 3) Visual Language Models.
- Support for InternVL3 native multimodal pre-training paradigm and variable visual position encoding (V2PE).
- Support for image-text-to-text pipeline inference.
- Support for batched image and text inputs.
- Support for text-only generation using multimodal checkpoints.