Transformers

Data & ML

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Latest: v5.2.070 releases24 breaking changes8 common errorsUpdated Feb 16, 2026View on GitHub

Release History

v5.2.0Breaking18 fixes4 features

Feb 16, 2026

This release introduces several major new models including VoxtralRealtime, GLM-5, and Qwen3.5, alongside significant internal refactoring, particularly around attention mechanisms and trainer stability.

v5.1.0Breaking21 fixes8 features

Feb 5, 2026

This release introduces four major new models: EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, and GlmOcr. It also includes several breaking changes related to model structure adjustments, cache initialization for sliding window attention, and configuration cleanup.

v5.0.0Breaking1 fix4 features

Jan 26, 2026

Transformers v5 is the first major release in five years, introducing significant API refactors like dynamic weight loading via WeightConverter and simplifying tokenizer architecture by consolidating slow/fast implementations. The release cadence is shifting to weekly minor updates.

v5.0.0rc3Breaking17 fixes10 features

Jan 26, 2026

This release candidate (v5.0.0rc3) introduces several new models including GLM-Lite and LWDetr, while aggressively removing deprecated classes and fixing numerous integration tests and minor bugs.

v4.57.61 fix

Jan 16, 2026

This patch release includes a fix for Qwen VL models that were failing to load correctly after configuration saving and reloading, complementing a previous patch.

v4.57.52 fixes

Jan 13, 2026

This patch release includes fixes for lr_scheduler_parsing and addresses an issue with skipped keys in setattr for QwenVL.

v4.57.43 fixes

Jan 13, 2026

This patch release for v4 includes minor fixes for remote generation methods like grouped beam search, vLLM integration, and an offline tokenizer loading issue for Mistral models.

v5.0.0rc2Breaking8 fixes7 features

Jan 8, 2026

This release focuses on fixing AutoTokenizer enforcement, optimizing MoE performance with batched implementations, and significantly improving model loading speeds via meta device initialization.

v5.0.0rc1Breaking6 fixes7 features

Jan 8, 2026

This release introduces major breaking changes including 'auto' as the default dtype, 50GB shard sizes for saving, and mandatory **kwargs in forward methods. It also adds support for new models (FastVLM, Lasr, PaddleOCR-VL) and a new dynamic weight loader for quantization.

v5.0.0rc0Breaking2 fixes6 features

Dec 1, 2025

Transformers v5 introduces a major overhaul of the library, featuring a new dynamic weight loading API and a unified tokenizer backend system to simplify internals and improve performance.

v4.57.32 fixes

Nov 25, 2025

This emergency patch fixes a critical bug affecting model loading when local_files_only is set to True and addresses a typo from a previous patch.

v4.57.2Breaking2 fixes3 features

Nov 24, 2025

This patch release focuses on fixing Mistral tokenizer mappings and Tekken pattern matching, while also correcting a decorator error in the device memory utility.

v4.57.15 fixes

Oct 14, 2025

This patch release focuses on fixing dependency parsing issues with Optax and Poetry, alongside stability improvements for FSDP and Python 3.9 support.

v4.57.06 features

Oct 3, 2025

This release introduces support for several next-generation model architectures, including the high-efficiency Qwen3-Next and Qwen3-VL series, the privacy-focused VaultGemma, and the high-speed Longcat Flash MoE.

v4.56.23 fixes1 feature

Sep 17, 2025

This release focuses on bug fixes for Jetmoe and Emu3 models, addresses a getter regression, and improves multi-processing performance for processors.

v4.56.1-Vault-Gemma-preview3 features

Sep 12, 2025

This release introduces a preview of the Vault-Gemma model, a 1B parameter decoder-only model trained with sequence-level differential privacy.

v4.56.1Breaking6 fixes

Sep 4, 2025

This patch release primarily fixes the new 'dtype' argument in pipelines and addresses several model-specific bugs including Llama4 accuracy and SamAttention attribute errors.

v4.56.0-Embedding-Gemma-preview3 features

Sep 4, 2025

This release introduces a preview of the EmbeddingGemma model, a highly efficient 308M parameter multilingual embedding model optimized for on-device RAG and mobile use cases.

v4.56.01 fix12 features

Aug 29, 2025

This release introduces several major vision and multimodal models including Dino v3, SAM 2, and Ovis 2, alongside a significant refactor of the caching system to optimize memory for sliding window attention.

v4.55.41 fix

Aug 22, 2025

This patch release corrects a technical error in the previous release process to properly apply the fix for issue #40197.

v4.55.3Breaking5 fixes

Aug 21, 2025

Patch release 4.55.3 focuses on stability improvements for FlashAttention-2 on Ascend NPU, FSDP sharding fixes, and critical bug fixes for GPT-OSS and Mamba models.

v4.55.2Breaking1 fix

Aug 13, 2025

Patch release 4.55.2 fixes a critical regression in Flash Attention 2 (FA2) generations caused by a missing utility import in version 4.55.1.

v4.55.1Breaking9 fixes2 features

Aug 13, 2025

Patch release 4.55.1 focuses on stabilizing the MXFP4 quantization for GPT-OSS models and resolving device-related bugs across several multimodal models like Idefics and SmolVLM.

4.55.0-GLM-4.5V-preview6 features

Aug 11, 2025

This release introduces GLM-4.5V, a high-performance multimodal reasoning model based on GLM-4.5-Air, featuring advanced capabilities in image, video, and GUI analysis.

v4.55.0Breaking8 features

Aug 5, 2025

OpenAI released GPT OSS, an open-source (Apache 2.0) MoE model family in 21B and 117B sizes featuring 4-bit MXFP4 quantization and Flash Attention 3 support. These models are optimized for reasoning and agentic tasks, compatible with the new Responses API and standard transformers workflows.

4.54.1Breaking10 fixes2 features

Jul 29, 2025

A maintenance patch release focused on fixing regressions in cache inheritance, device placement, and distributed training (TP/device-mesh) across various model architectures like ModernBERT, GPT2, and Mamba.

v4.54.0Breaking1 fix10 features

Jul 25, 2025

This release focuses on reducing library bloat and increasing speed through refactored Llama models, megablocks kernel integration, and native distributed training. It also introduces several new model architectures including Ernie 4.5, Voxtral, DeepSeek-V2, and LFM2.

v4.53.2-Ernie-4.5-preview2 features

Jul 23, 2025

This preview release introduces Baidu's Ernie 4.5 model family to Transformers, including a 0.3B dense model and MoE variants (21B and 300B).

v4.53.31 fix

Jul 22, 2025

A small patch release (v4.53.3) that refactors OpenTelemetry integration by removing explicit provider setter calls.

v4.53.2-modernbert-decoder-previewBreaking5 features

Jul 16, 2025

This release introduces a preview of the ModernBERT Decoder, a causal language model variant of the ModernBERT architecture designed for autoregressive generation and sequence classification.

v4.53.2Breaking6 fixes2 features

Jul 11, 2025

This patch release focuses on critical bug fixes for GLM-4.1V and GLM-4V models, resolves hardware-specific issues on Ascend NPU, and deprecates the sliding window feature.

v4.53.1Breaking7 fixes1 feature

Jul 4, 2025

This patch release focuses on bug fixes for Vision Language Models (VLMs) like Qwen2-VL and SmolVLM, alongside introducing packed tensor format support for various attention backends.

v4.53.08 features

Jun 26, 2025

Release v4.53.0 introduces several major model architectures including Gemma 3n, Dia TTS, Kyutai STT, and the massive 456B MiniMax model. The update focuses heavily on multimodal capabilities, efficient parameter usage, and long-context support.

v4.52.4-Kyutai-STT-preview3 features

Jun 24, 2025

This release introduces a preview of the Kyutai-STT model architecture, featuring 1B and 2.6B parameter checkpoints for high-accuracy speech-to-text transcription.

v4.52.4-VJEPA-2-preview3 features

Jun 11, 2025

This release introduces a preview of the V-JEPA 2 model, a state-of-the-art self-supervised video encoder for motion understanding and robot manipulation tasks.

v4.52.4-ColQwen2-preview4 features

Jun 2, 2025

This release introduces a preview of the ColQwen2 model, a visual-based document retrieval system that leverages the Qwen2-VL backbone for late interaction similarity scoring.

v4.52.4Breaking4 fixes2 features

May 30, 2025

This patch release focuses on bug fixes for Vision Language Models (Qwen-VL, PaliGemma), attention scaling corrections for OPT, and compatibility improvements for older PyTorch versions.

v4.52.32 fixes

May 22, 2025

This patch release fixes issues related to torch distributed initialization and protects ParallelInterface imports to ensure stability in distributed environments.

v4.52.22 fixes2 features

May 21, 2025

This patch release re-introduces 3D parallel training support while fixing a device map override bug and improving import error clarity.

v4.52.15 features

May 20, 2025

This release introduces several major multimodal and specialized models, including the Qwen2.5-Omni streaming model, the high-precision SAM-HQ segmenter, and the D-FINE real-time object detector.

v4.51.3-CSM-previewBreaking5 features

May 8, 2025

This release introduces the Conversational Speech Model (CSM), an open-source contextual text-to-speech model capable of generating natural speech from multi-turn dialogue context.

v4.51.3-GraniteMoeHybrid-preview3 features

May 8, 2025

This release introduces the GraniteMoeHybrid model architecture, a hybrid design combining state space layers and Mixture-of-Experts (MoE) attention, available as a stable preview ahead of the v4.52.0 minor release.

v4.51.3-D-FINE-preview3 features

May 8, 2025

This release introduces a preview of the D-FINE model, a high-performance real-time object detector featuring fine-grained distribution refinement for superior localization accuracy.

v4.51.3-SAM-HQ-preview5 features

May 8, 2025

This release introduces a preview of SAM-HQ, an enhancement to the Segment Anything Model that provides higher quality segmentation masks with minimal additional parameters.

v4.51.3-BitNet-preview2 features

May 8, 2025

This preview release introduces the BitNet model architecture to the transformers library, enabling high-performance 1-bit LLM inference.

v4.51.3-LlamaGuard-preview5 features

Apr 30, 2025

This release introduces LlamaGuard 4 and Llama Prompt Guard 2, providing multimodal safety moderation for text and images. It is available as a preview tag prior to the official v4.52.0 minor release.

v4.51.3-Qwen2.5-Omni-preview6 features

Apr 24, 2025

This release introduces Qwen2.5-Omni, an end-to-end multimodal model capable of perceiving text, images, audio, and video while generating synchronized text and speech responses.

v4.51.3-InternVL-preview5 features

Apr 22, 2025

This preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.

v4.51.3-Janus-preview5 features

Apr 22, 2025

This release introduces a preview of the Janus and Janus-Pro models, a unified multimodal framework capable of both visual understanding and text-to-image generation by decoupling visual encoding pathways.

v4.51.3-TimesFM-preview3 features

Apr 22, 2025

This release introduces a preview of TimesFM, a decoder-only foundation model for time-series forecasting, available as a specialized tag on top of transformers v4.51.3.

v4.51.3-MLCD-preview3 features

Apr 22, 2025

This release introduces a preview of the MLCD vision model, a foundational visual model optimized for multimodal LLMs like LLaVA, developed by DeepGlint-AI.

v4.51.32 fixes1 feature

Apr 14, 2025

This patch release introduces support for the GLM-4 model and includes several fixes for PyTorch version compatibility, specifically regarding FlexAttention.

v4.51.23 fixes1 feature

Apr 10, 2025

A minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.

v4.51.18 fixes

Apr 8, 2025

This patch release focuses on stabilizing Llama 4 support and fixing compatibility issues with torch 2.6.0, DeepSpeed, and weight initialization.

v4.51.0Breaking9 fixes6 features

Apr 5, 2025

This release introduces support for Llama 4, Phi4-Multimodal, DeepSeek-v3, and Qwen3 architectures, alongside a major documentation overhaul and modularization of speech models.

v4.50.3-DeepSeek-36 features

Mar 28, 2025

This release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.

v4.50.33 fixes

Mar 28, 2025

This patch release fixes bugs related to beam search output cropping, BLIP-2 floating-point precision mismatches, and PixtralProcessor configuration.

v4.50.22 fixes1 feature

Mar 27, 2025

A patch release focusing on backend stability, specifically fixing image processing for Gemma3 and Qwen2-VL, and updating torch version validation.

v4.50.14 fixes

Mar 25, 2025

A patch release addressing minor bugs in hub kernels, remote code, and specific model implementations like Chameleon and PyTorch deformable attention.

v4.50.01 fix7 features

Mar 21, 2025

Release v4.50.0 introduces a new model-based release strategy and adds support for several major vision-language models including Gemma 3, Aya Vision, Mistral 3.1, and SigLIP-2.

v4.49.0-Mistral-37 features

Mar 18, 2025

This release introduces Mistral 3 (Mistral Small 3.1) to the Transformers library, a 24B parameter model featuring 128k context length and advanced vision-language capabilities.

v4.49.0-Gemma-36 features

Mar 18, 2025

This release introduces Google's Gemma 3 multimodal models to the transformers library, featuring a SigLIP vision encoder and Gemma 2 language decoder with support for high-resolution image cropping and multi-image inference.

v4.49.0-AyaVision4 features

Mar 4, 2025

This release introduces Aya Vision 8B and 32B, multilingual multimodal models combining SigLIP-2 vision encoders with Cohere language models, available via a specialized transformers release tag.

v4.49.0-SigLIP-2Breaking5 features

Feb 21, 2025

This release introduces SigLIP-2, a new family of multilingual vision-language encoders featuring improved semantic understanding and support for native aspect ratio image processing via the NaFlex variant.

v4.49.0-SmolVLM-24 features

Feb 20, 2025

This release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.

v4.49.0Breaking2 fixes12 features

Feb 17, 2025

This release introduces several new models including Helium, Qwen2.5-VL, and Zamba2, alongside a new CLI chat feature and standardized fast image processors.

v4.48.34 fixes

Feb 7, 2025

This patch release primarily addresses Python 3.9 compatibility issues, fixes device failures in the RoPE module, and resolves generation bugs for PaliGemma2.

v4.48.2Breaking5 fixes

Jan 30, 2025

This patch release primarily restores Python 3.9 compatibility and fixes regressions related to DBRX model loading and HybridCache mask slicing.

v4.48.13 fixes

Jan 20, 2025

Patch release v4.48.1 fixes a typo in Phi model attention bias, resolves a logic error in gradient accumulation loss, and patches Moonshine's generate wrapper.

v4.48.08 features

Jan 10, 2025

This release introduces several major model architectures including ModernBERT, Aria (MoE), and Bamba (Mamba-2), while adding a TimmWrapper to integrate timm library models directly into the Transformers ecosystem.

Common Errors

OutOfMemoryError2 reports

OutOfMemoryError in transformers usually stems from excessively large models or batch sizes exceeding available GPU memory. Reduce the batch size during training/inference, enable gradient accumulation, or explore model parallelism techniques like `accelerate` or `torch.distributed.nn.Module` to distribute the model across multiple devices. Consider using quantization techniques (e.g., bitsandbytes) or offloading layers to CPU/disk (using accelerate's device_map) to further decrease memory footprint.

ChildFailedError2 reports

ChildFailedError in transformers often arises from inconsistencies in distributed training setup, particularly when using Accelerate and FSDP with Trainer. Ensure that all processes have the same environment and configurations, and explicitly set `ddp_find_unused_parameters=False` in your Trainer arguments to avoid deadlocks due to unused parameters. Also, verify that all processes are synchronized properly, especially within custom training loops or callbacks where data loading or model updates may not be identical across ranks.

RuntimeError2 reports

RuntimeError in transformers, especially with torch.compile or specific models like Qwen, often stems from unsupported configurations like incompatible dtype settings ("auto" can be problematic) or issues within the model's forward pass under compilation. Try explicitly setting `torch_dtype=torch.float16` or `torch.bfloat16` when loading the model and generating text to avoid "auto" dtype, and consider updating transformers and torch versions to the latest stable releases or using older versions known to work. If using torch.compile, verify that the specific model or operation is supported and if not, disable compilation for problematic sections.

NotImplementedError1 report

The "NotImplementedError" in transformers often arises when a specific function or method crucial for a task (like saving a model) hasn't been defined or overridden in a particular model's class or configuration. To resolve this, either implement the missing method in the relevant class (e.g., the `save_pretrained` method for saving) or ensure all required methods are implemented by using the appropriate base class and configuration files. Consider carefully reviewing the model's configuration, inheritance structure, and required functionality to identify the missing implementation.

FileNotFoundError1 report

The "FileNotFoundError" in transformers usually arises when the specified tokenizer or model files are not present in the expected location, especially when using `local_files_only=True` or offline mode. To fix it, either ensure the necessary files are downloaded by setting `local_files_only=False` during the initial loading to allow downloading from the Hub, or manually download the files and place them in the specified cache directory. If the files are already downloaded but the error persists, double-check the specified path and file names for correctness.

DistBackendError1 report

DistBackendError usually occurs when the distributed training environment isn't properly initialized, often due to NCCL issues or incorrect configuration of `torch.distributed`. Ensure NCCL is correctly installed and configured by verifying `NCCL_DEBUG=INFO` environment variable output; then, double-check your `torch.distributed` initialization script (e.g., `torch.distributed.init_process_group`) for correct `backend`, `rank`, and `world_size` values. Using the correct device (GPU) can also resolve the error.

Related Data & ML Packages

TensorFlow

An Open Source Machine Learning Framework for Everyone

PyTorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

scikit-learn

scikit-learn: machine learning in Python

Pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Streamlit

Streamlit — A faster way to build and share data apps.

Gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Subscribe to Updates

Get notified when new versions are released

RSS Feed