v4.48.0
📦 transformersView on GitHub →
✨ 8 features🔧 10 symbols
Summary
This release introduces several major model architectures including ModernBERT, Aria (MoE), and Bamba (Mamba-2), while adding a TimmWrapper to integrate timm library models directly into the Transformers ecosystem.
Migration Steps
- If you were using older BERT or RoBERTa models, consider evaluating the new ModernBERT model for potential performance and efficiency improvements, especially for long sequences (up to 8192 tokens).
- If you are working with multimodal tasks (vision and language), investigate the new Aria model.
- If you need to integrate models from the 'timm' library into the Hugging Face ecosystem (e.g., for image classification pipelines or Trainer integration), use the new AutoModelForImageClassification.from_pretrained() with a timm checkpoint, which now leverages the TimmWrapper.
- If you are using Pixtral models, update your configuration or checkpoint loading logic if you need to use the newly supported Pixtral-Large variant.
- If you are working on document retrieval tasks involving document images, explore the ColPali model.
- If you are looking for new base models focusing on code, STEM, and multilingual capabilities, check out the Falcon3 family of models (Falcon3-1B-Base, Falcon3-3B-Base, Falcon3-Mamba-7B-Base, Falcon3-7B-Base, and Falcon3-10B-Base).
✨ New Features
- Added ModernBERT: A modern bidirectional encoder with Rotary Positional Embeddings, Unpadding, GeGLU, and Alternating Attention supporting up to 8192 tokens.
- Added Aria: An open multimodal-native Mixture-of-Experts (MoE) model.
- Added TimmWrapper: Allows timm models to be loaded as Transformers models with support for pipelines, Trainer, and quantization.
- Added ColPali: A VLM-based model for efficient document retrieval using multi-vector embeddings.
- Added Falcon3: A new family of models (1B, 3B, 7B, 10B, and Mamba-7B) with improved science, math, and code capabilities.
- Added Bamba: A 9B parameter decoder-only model based on the Mamba-2 architecture.
- Added VitPose: A vision transformer-based model for human pose estimation.
- Updated Pixtral modeling and conversion scripts to support Pixtral-Large.
🔧 Affected Symbols
ModernBertAriaTimmWrapperAutoModelForImageClassificationAutoImageProcessorColPaliFalcon3BambaVitPosePixtral