Transformers
Data & ML🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Release History
v5.2.0Breaking18 fixes4 featuresThis release introduces several major new models including VoxtralRealtime, GLM-5, and Qwen3.5, alongside significant internal refactoring, particularly around attention mechanisms and trainer stability.
v5.1.0Breaking21 fixes8 featuresThis release introduces four major new models: EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, and GlmOcr. It also includes several breaking changes related to model structure adjustments, cache initialization for sliding window attention, and configuration cleanup.
v5.0.0Breaking1 fix4 featuresTransformers v5 is the first major release in five years, introducing significant API refactors like dynamic weight loading via WeightConverter and simplifying tokenizer architecture by consolidating slow/fast implementations. The release cadence is shifting to weekly minor updates.
v5.0.0rc3Breaking17 fixes10 featuresThis release candidate (v5.0.0rc3) introduces several new models including GLM-Lite and LWDetr, while aggressively removing deprecated classes and fixing numerous integration tests and minor bugs.
v4.57.61 fixThis patch release includes a fix for Qwen VL models that were failing to load correctly after configuration saving and reloading, complementing a previous patch.
v4.57.52 fixesThis patch release includes fixes for lr_scheduler_parsing and addresses an issue with skipped keys in setattr for QwenVL.
v4.57.43 fixesThis patch release for v4 includes minor fixes for remote generation methods like grouped beam search, vLLM integration, and an offline tokenizer loading issue for Mistral models.
v5.0.0rc2Breaking8 fixes7 featuresThis release focuses on fixing AutoTokenizer enforcement, optimizing MoE performance with batched implementations, and significantly improving model loading speeds via meta device initialization.
v5.0.0rc1Breaking6 fixes7 featuresThis release introduces major breaking changes including 'auto' as the default dtype, 50GB shard sizes for saving, and mandatory **kwargs in forward methods. It also adds support for new models (FastVLM, Lasr, PaddleOCR-VL) and a new dynamic weight loader for quantization.
v5.0.0rc0Breaking2 fixes6 featuresTransformers v5 introduces a major overhaul of the library, featuring a new dynamic weight loading API and a unified tokenizer backend system to simplify internals and improve performance.
v4.57.32 fixesThis emergency patch fixes a critical bug affecting model loading when local_files_only is set to True and addresses a typo from a previous patch.
v4.57.2Breaking2 fixes3 featuresThis patch release focuses on fixing Mistral tokenizer mappings and Tekken pattern matching, while also correcting a decorator error in the device memory utility.
v4.57.15 fixesThis patch release focuses on fixing dependency parsing issues with Optax and Poetry, alongside stability improvements for FSDP and Python 3.9 support.
v4.57.06 featuresThis release introduces support for several next-generation model architectures, including the high-efficiency Qwen3-Next and Qwen3-VL series, the privacy-focused VaultGemma, and the high-speed Longcat Flash MoE.
v4.56.23 fixes1 featureThis release focuses on bug fixes for Jetmoe and Emu3 models, addresses a getter regression, and improves multi-processing performance for processors.
v4.56.1-Vault-Gemma-preview3 featuresThis release introduces a preview of the Vault-Gemma model, a 1B parameter decoder-only model trained with sequence-level differential privacy.
v4.56.1Breaking6 fixesThis patch release primarily fixes the new 'dtype' argument in pipelines and addresses several model-specific bugs including Llama4 accuracy and SamAttention attribute errors.
v4.56.0-Embedding-Gemma-preview3 featuresThis release introduces a preview of the EmbeddingGemma model, a highly efficient 308M parameter multilingual embedding model optimized for on-device RAG and mobile use cases.
v4.56.01 fix12 featuresThis release introduces several major vision and multimodal models including Dino v3, SAM 2, and Ovis 2, alongside a significant refactor of the caching system to optimize memory for sliding window attention.
v4.55.41 fixThis patch release corrects a technical error in the previous release process to properly apply the fix for issue #40197.
v4.55.3Breaking5 fixesPatch release 4.55.3 focuses on stability improvements for FlashAttention-2 on Ascend NPU, FSDP sharding fixes, and critical bug fixes for GPT-OSS and Mamba models.
v4.55.2Breaking1 fixPatch release 4.55.2 fixes a critical regression in Flash Attention 2 (FA2) generations caused by a missing utility import in version 4.55.1.
v4.55.1Breaking9 fixes2 featuresPatch release 4.55.1 focuses on stabilizing the MXFP4 quantization for GPT-OSS models and resolving device-related bugs across several multimodal models like Idefics and SmolVLM.
4.55.0-GLM-4.5V-preview6 featuresThis release introduces GLM-4.5V, a high-performance multimodal reasoning model based on GLM-4.5-Air, featuring advanced capabilities in image, video, and GUI analysis.
v4.55.0Breaking8 featuresOpenAI released GPT OSS, an open-source (Apache 2.0) MoE model family in 21B and 117B sizes featuring 4-bit MXFP4 quantization and Flash Attention 3 support. These models are optimized for reasoning and agentic tasks, compatible with the new Responses API and standard transformers workflows.
4.54.1Breaking10 fixes2 featuresA maintenance patch release focused on fixing regressions in cache inheritance, device placement, and distributed training (TP/device-mesh) across various model architectures like ModernBERT, GPT2, and Mamba.
v4.54.0Breaking1 fix10 featuresThis release focuses on reducing library bloat and increasing speed through refactored Llama models, megablocks kernel integration, and native distributed training. It also introduces several new model architectures including Ernie 4.5, Voxtral, DeepSeek-V2, and LFM2.
v4.53.2-Ernie-4.5-preview2 featuresThis preview release introduces Baidu's Ernie 4.5 model family to Transformers, including a 0.3B dense model and MoE variants (21B and 300B).
v4.53.31 fixA small patch release (v4.53.3) that refactors OpenTelemetry integration by removing explicit provider setter calls.
v4.53.2-modernbert-decoder-previewBreaking5 featuresThis release introduces a preview of the ModernBERT Decoder, a causal language model variant of the ModernBERT architecture designed for autoregressive generation and sequence classification.
v4.53.2Breaking6 fixes2 featuresThis patch release focuses on critical bug fixes for GLM-4.1V and GLM-4V models, resolves hardware-specific issues on Ascend NPU, and deprecates the sliding window feature.
v4.53.1Breaking7 fixes1 featureThis patch release focuses on bug fixes for Vision Language Models (VLMs) like Qwen2-VL and SmolVLM, alongside introducing packed tensor format support for various attention backends.
v4.53.08 featuresRelease v4.53.0 introduces several major model architectures including Gemma 3n, Dia TTS, Kyutai STT, and the massive 456B MiniMax model. The update focuses heavily on multimodal capabilities, efficient parameter usage, and long-context support.
v4.52.4-Kyutai-STT-preview3 featuresThis release introduces a preview of the Kyutai-STT model architecture, featuring 1B and 2.6B parameter checkpoints for high-accuracy speech-to-text transcription.
v4.52.4-VJEPA-2-preview3 featuresThis release introduces a preview of the V-JEPA 2 model, a state-of-the-art self-supervised video encoder for motion understanding and robot manipulation tasks.
v4.52.4-ColQwen2-preview4 featuresThis release introduces a preview of the ColQwen2 model, a visual-based document retrieval system that leverages the Qwen2-VL backbone for late interaction similarity scoring.
v4.52.4Breaking4 fixes2 featuresThis patch release focuses on bug fixes for Vision Language Models (Qwen-VL, PaliGemma), attention scaling corrections for OPT, and compatibility improvements for older PyTorch versions.
v4.52.32 fixesThis patch release fixes issues related to torch distributed initialization and protects ParallelInterface imports to ensure stability in distributed environments.
v4.52.22 fixes2 featuresThis patch release re-introduces 3D parallel training support while fixing a device map override bug and improving import error clarity.
v4.52.15 featuresThis release introduces several major multimodal and specialized models, including the Qwen2.5-Omni streaming model, the high-precision SAM-HQ segmenter, and the D-FINE real-time object detector.
v4.51.3-CSM-previewBreaking5 featuresThis release introduces the Conversational Speech Model (CSM), an open-source contextual text-to-speech model capable of generating natural speech from multi-turn dialogue context.
v4.51.3-GraniteMoeHybrid-preview3 featuresThis release introduces the GraniteMoeHybrid model architecture, a hybrid design combining state space layers and Mixture-of-Experts (MoE) attention, available as a stable preview ahead of the v4.52.0 minor release.
v4.51.3-D-FINE-preview3 featuresThis release introduces a preview of the D-FINE model, a high-performance real-time object detector featuring fine-grained distribution refinement for superior localization accuracy.
v4.51.3-SAM-HQ-preview5 featuresThis release introduces a preview of SAM-HQ, an enhancement to the Segment Anything Model that provides higher quality segmentation masks with minimal additional parameters.
v4.51.3-BitNet-preview2 featuresThis preview release introduces the BitNet model architecture to the transformers library, enabling high-performance 1-bit LLM inference.
v4.51.3-LlamaGuard-preview5 featuresThis release introduces LlamaGuard 4 and Llama Prompt Guard 2, providing multimodal safety moderation for text and images. It is available as a preview tag prior to the official v4.52.0 minor release.
v4.51.3-Qwen2.5-Omni-preview6 featuresThis release introduces Qwen2.5-Omni, an end-to-end multimodal model capable of perceiving text, images, audio, and video while generating synchronized text and speech responses.
v4.51.3-InternVL-preview5 featuresThis preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.
v4.51.3-Janus-preview5 featuresThis release introduces a preview of the Janus and Janus-Pro models, a unified multimodal framework capable of both visual understanding and text-to-image generation by decoupling visual encoding pathways.
v4.51.3-TimesFM-preview3 featuresThis release introduces a preview of TimesFM, a decoder-only foundation model for time-series forecasting, available as a specialized tag on top of transformers v4.51.3.
v4.51.3-MLCD-preview3 featuresThis release introduces a preview of the MLCD vision model, a foundational visual model optimized for multimodal LLMs like LLaVA, developed by DeepGlint-AI.
v4.51.32 fixes1 featureThis patch release introduces support for the GLM-4 model and includes several fixes for PyTorch version compatibility, specifically regarding FlexAttention.
v4.51.23 fixes1 featureA minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.
v4.51.18 fixesThis patch release focuses on stabilizing Llama 4 support and fixing compatibility issues with torch 2.6.0, DeepSpeed, and weight initialization.
v4.51.0Breaking9 fixes6 featuresThis release introduces support for Llama 4, Phi4-Multimodal, DeepSeek-v3, and Qwen3 architectures, alongside a major documentation overhaul and modularization of speech models.
v4.50.3-DeepSeek-36 featuresThis release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.
v4.50.33 fixesThis patch release fixes bugs related to beam search output cropping, BLIP-2 floating-point precision mismatches, and PixtralProcessor configuration.
v4.50.22 fixes1 featureA patch release focusing on backend stability, specifically fixing image processing for Gemma3 and Qwen2-VL, and updating torch version validation.
v4.50.14 fixesA patch release addressing minor bugs in hub kernels, remote code, and specific model implementations like Chameleon and PyTorch deformable attention.
v4.50.01 fix7 featuresRelease v4.50.0 introduces a new model-based release strategy and adds support for several major vision-language models including Gemma 3, Aya Vision, Mistral 3.1, and SigLIP-2.
v4.49.0-Mistral-37 featuresThis release introduces Mistral 3 (Mistral Small 3.1) to the Transformers library, a 24B parameter model featuring 128k context length and advanced vision-language capabilities.
v4.49.0-Gemma-36 featuresThis release introduces Google's Gemma 3 multimodal models to the transformers library, featuring a SigLIP vision encoder and Gemma 2 language decoder with support for high-resolution image cropping and multi-image inference.
v4.49.0-AyaVision4 featuresThis release introduces Aya Vision 8B and 32B, multilingual multimodal models combining SigLIP-2 vision encoders with Cohere language models, available via a specialized transformers release tag.
v4.49.0-SigLIP-2Breaking5 featuresThis release introduces SigLIP-2, a new family of multilingual vision-language encoders featuring improved semantic understanding and support for native aspect ratio image processing via the NaFlex variant.
v4.49.0-SmolVLM-24 featuresThis release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.
v4.49.0Breaking2 fixes12 featuresThis release introduces several new models including Helium, Qwen2.5-VL, and Zamba2, alongside a new CLI chat feature and standardized fast image processors.
v4.48.34 fixesThis patch release primarily addresses Python 3.9 compatibility issues, fixes device failures in the RoPE module, and resolves generation bugs for PaliGemma2.
v4.48.2Breaking5 fixesThis patch release primarily restores Python 3.9 compatibility and fixes regressions related to DBRX model loading and HybridCache mask slicing.
v4.48.13 fixesPatch release v4.48.1 fixes a typo in Phi model attention bias, resolves a logic error in gradient accumulation loss, and patches Moonshine's generate wrapper.
v4.48.08 featuresThis release introduces several major model architectures including ModernBERT, Aria (MoE), and Bamba (Mamba-2), while adding a TimmWrapper to integrate timm library models directly into the Transformers ecosystem.
Common Errors
OutOfMemoryError2 reportsOutOfMemoryError in transformers usually stems from excessively large models or batch sizes exceeding available GPU memory. Reduce the batch size during training/inference, enable gradient accumulation, or explore model parallelism techniques like `accelerate` or `torch.distributed.nn.Module` to distribute the model across multiple devices. Consider using quantization techniques (e.g., bitsandbytes) or offloading layers to CPU/disk (using accelerate's device_map) to further decrease memory footprint.
ChildFailedError2 reportsChildFailedError in transformers often arises from inconsistencies in distributed training setup, particularly when using Accelerate and FSDP with Trainer. Ensure that all processes have the same environment and configurations, and explicitly set `ddp_find_unused_parameters=False` in your Trainer arguments to avoid deadlocks due to unused parameters. Also, verify that all processes are synchronized properly, especially within custom training loops or callbacks where data loading or model updates may not be identical across ranks.
RuntimeError2 reportsRuntimeError in transformers, especially with torch.compile or specific models like Qwen, often stems from unsupported configurations like incompatible dtype settings ("auto" can be problematic) or issues within the model's forward pass under compilation. Try explicitly setting `torch_dtype=torch.float16` or `torch.bfloat16` when loading the model and generating text to avoid "auto" dtype, and consider updating transformers and torch versions to the latest stable releases or using older versions known to work. If using torch.compile, verify that the specific model or operation is supported and if not, disable compilation for problematic sections.
NotImplementedError1 reportThe "NotImplementedError" in transformers often arises when a specific function or method crucial for a task (like saving a model) hasn't been defined or overridden in a particular model's class or configuration. To resolve this, either implement the missing method in the relevant class (e.g., the `save_pretrained` method for saving) or ensure all required methods are implemented by using the appropriate base class and configuration files. Consider carefully reviewing the model's configuration, inheritance structure, and required functionality to identify the missing implementation.
FileNotFoundError1 reportThe "FileNotFoundError" in transformers usually arises when the specified tokenizer or model files are not present in the expected location, especially when using `local_files_only=True` or offline mode. To fix it, either ensure the necessary files are downloaded by setting `local_files_only=False` during the initial loading to allow downloading from the Hub, or manually download the files and place them in the specified cache directory. If the files are already downloaded but the error persists, double-check the specified path and file names for correctness.
DistBackendError1 reportDistBackendError usually occurs when the distributed training environment isn't properly initialized, often due to NCCL issues or incorrect configuration of `torch.distributed`. Ensure NCCL is correctly installed and configured by verifying `NCCL_DEBUG=INFO` environment variable output; then, double-check your `torch.distributed` initialization script (e.g., `torch.distributed.init_process_group`) for correct `backend`, `rank`, and `world_size` values. Using the correct device (GPU) can also resolve the error.
Related Data & ML Packages
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
scikit-learn: machine learning in Python
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Streamlit — A faster way to build and share data apps.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Subscribe to Updates
Get notified when new versions are released