v4.48.0

📅 Jan 10, 2025📦 transformersView on GitHub →

✨ 8 features🔧 10 symbols

Summary

This release introduces several major model architectures including ModernBERT, Aria (MoE), and Bamba (Mamba-2), while adding a TimmWrapper to integrate timm library models directly into the Transformers ecosystem.

Migration Steps

If you were using older BERT or RoBERTa models, consider evaluating the new ModernBERT model for potential performance and efficiency improvements, especially for long sequences (up to 8192 tokens).
If you are working with multimodal tasks (vision and language), investigate the new Aria model.
If you need to integrate models from the 'timm' library into the Hugging Face ecosystem (e.g., for image classification pipelines or Trainer integration), use the new AutoModelForImageClassification.from_pretrained() with a timm checkpoint, which now leverages the TimmWrapper.
If you are using Pixtral models, update your configuration or checkpoint loading logic if you need to use the newly supported Pixtral-Large variant.
If you are working on document retrieval tasks involving document images, explore the ColPali model.
If you are looking for new base models focusing on code, STEM, and multilingual capabilities, check out the Falcon3 family of models (Falcon3-1B-Base, Falcon3-3B-Base, Falcon3-Mamba-7B-Base, Falcon3-7B-Base, and Falcon3-10B-Base).

✨ New Features

Added ModernBERT: A modern bidirectional encoder with Rotary Positional Embeddings, Unpadding, GeGLU, and Alternating Attention supporting up to 8192 tokens.
Added Aria: An open multimodal-native Mixture-of-Experts (MoE) model.
Added TimmWrapper: Allows timm models to be loaded as Transformers models with support for pipelines, Trainer, and quantization.
Added ColPali: A VLM-based model for efficient document retrieval using multi-vector embeddings.
Added Falcon3: A new family of models (1B, 3B, 7B, 10B, and Mamba-7B) with improved science, math, and code capabilities.
Added Bamba: A 9B parameter decoder-only model based on the Mamba-2 architecture.
Added VitPose: A vision transformer-based model for human pose estimation.
Updated Pixtral modeling and conversion scripts to support Pixtral-Large.

🔧 Affected Symbols

ModernBertAriaTimmWrapperAutoModelForImageClassificationAutoImageProcessorColPaliFalcon3BambaVitPosePixtral