Change8

Transformers

Data & ML

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Latest: v5.0.0rc263 releases20 breaking changesView on GitHub →

Release History

v5.0.0rc2Breaking8 fixes7 features
Jan 8, 2026

This release focuses on fixing AutoTokenizer enforcement, optimizing MoE performance with batched implementations, and significantly improving model loading speeds via meta device initialization.

v5.0.0rc1Breaking6 fixes7 features
Jan 8, 2026

This release introduces major breaking changes including 'auto' as the default dtype, 50GB shard sizes for saving, and mandatory **kwargs in forward methods. It also adds support for new models (FastVLM, Lasr, PaddleOCR-VL) and a new dynamic weight loader for quantization.

v5.0.0rc0Breaking2 fixes6 features
Dec 1, 2025

Transformers v5 introduces a major overhaul of the library, featuring a new dynamic weight loading API and a unified tokenizer backend system to simplify internals and improve performance.

v4.57.32 fixes
Nov 25, 2025

This emergency patch fixes a critical bug affecting model loading when local_files_only is set to True and addresses a typo from a previous patch.

v4.57.2Breaking2 fixes3 features
Nov 24, 2025

This patch release focuses on fixing Mistral tokenizer mappings and Tekken pattern matching, while also correcting a decorator error in the device memory utility.

v4.57.15 fixes
Oct 14, 2025

This patch release focuses on fixing dependency parsing issues with Optax and Poetry, alongside stability improvements for FSDP and Python 3.9 support.

v4.57.06 features
Oct 3, 2025

This release introduces support for several next-generation model architectures, including the high-efficiency Qwen3-Next and Qwen3-VL series, the privacy-focused VaultGemma, and the high-speed Longcat Flash MoE.

v4.56.23 fixes1 feature
Sep 17, 2025

This release focuses on bug fixes for Jetmoe and Emu3 models, addresses a getter regression, and improves multi-processing performance for processors.

v4.56.1-Vault-Gemma-preview3 features
Sep 12, 2025

This release introduces a preview of the Vault-Gemma model, a 1B parameter decoder-only model trained with sequence-level differential privacy.

v4.56.1Breaking6 fixes
Sep 4, 2025

This patch release primarily fixes the new 'dtype' argument in pipelines and addresses several model-specific bugs including Llama4 accuracy and SamAttention attribute errors.

v4.56.0-Embedding-Gemma-preview3 features
Sep 4, 2025

This release introduces a preview of the EmbeddingGemma model, a highly efficient 308M parameter multilingual embedding model optimized for on-device RAG and mobile use cases.

v4.56.01 fix12 features
Aug 29, 2025

This release introduces several major vision and multimodal models including Dino v3, SAM 2, and Ovis 2, alongside a significant refactor of the caching system to optimize memory for sliding window attention.

v4.55.41 fix
Aug 22, 2025

This patch release corrects a technical error in the previous release process to properly apply the fix for issue #40197.

v4.55.3Breaking5 fixes
Aug 21, 2025

Patch release 4.55.3 focuses on stability improvements for FlashAttention-2 on Ascend NPU, FSDP sharding fixes, and critical bug fixes for GPT-OSS and Mamba models.

v4.55.2Breaking1 fix
Aug 13, 2025

Patch release 4.55.2 fixes a critical regression in Flash Attention 2 (FA2) generations caused by a missing utility import in version 4.55.1.

v4.55.1Breaking9 fixes2 features
Aug 13, 2025

Patch release 4.55.1 focuses on stabilizing the MXFP4 quantization for GPT-OSS models and resolving device-related bugs across several multimodal models like Idefics and SmolVLM.

4.55.0-GLM-4.5V-preview6 features
Aug 11, 2025

This release introduces GLM-4.5V, a high-performance multimodal reasoning model based on GLM-4.5-Air, featuring advanced capabilities in image, video, and GUI analysis.

v4.55.0Breaking8 features
Aug 5, 2025

OpenAI released GPT OSS, an open-source (Apache 2.0) MoE model family in 21B and 117B sizes featuring 4-bit MXFP4 quantization and Flash Attention 3 support. These models are optimized for reasoning and agentic tasks, compatible with the new Responses API and standard transformers workflows.

4.54.1Breaking10 fixes2 features
Jul 29, 2025

A maintenance patch release focused on fixing regressions in cache inheritance, device placement, and distributed training (TP/device-mesh) across various model architectures like ModernBERT, GPT2, and Mamba.

v4.54.0Breaking1 fix10 features
Jul 25, 2025

This release focuses on reducing library bloat and increasing speed through refactored Llama models, megablocks kernel integration, and native distributed training. It also introduces several new model architectures including Ernie 4.5, Voxtral, DeepSeek-V2, and LFM2.

v4.53.2-Ernie-4.5-preview2 features
Jul 23, 2025

This preview release introduces Baidu's Ernie 4.5 model family to Transformers, including a 0.3B dense model and MoE variants (21B and 300B).

v4.53.31 fix
Jul 22, 2025

A small patch release (v4.53.3) that refactors OpenTelemetry integration by removing explicit provider setter calls.

v4.53.2-modernbert-decoder-previewBreaking5 features
Jul 16, 2025

This release introduces a preview of the ModernBERT Decoder, a causal language model variant of the ModernBERT architecture designed for autoregressive generation and sequence classification.

v4.53.2Breaking6 fixes2 features
Jul 11, 2025

This patch release focuses on critical bug fixes for GLM-4.1V and GLM-4V models, resolves hardware-specific issues on Ascend NPU, and deprecates the sliding window feature.

v4.53.1Breaking7 fixes1 feature
Jul 4, 2025

This patch release focuses on bug fixes for Vision Language Models (VLMs) like Qwen2-VL and SmolVLM, alongside introducing packed tensor format support for various attention backends.

v4.53.08 features
Jun 26, 2025

Release v4.53.0 introduces several major model architectures including Gemma 3n, Dia TTS, Kyutai STT, and the massive 456B MiniMax model. The update focuses heavily on multimodal capabilities, efficient parameter usage, and long-context support.

v4.52.4-Kyutai-STT-preview3 features
Jun 24, 2025

This release introduces a preview of the Kyutai-STT model architecture, featuring 1B and 2.6B parameter checkpoints for high-accuracy speech-to-text transcription.

v4.52.4-VJEPA-2-preview3 features
Jun 11, 2025

This release introduces a preview of the V-JEPA 2 model, a state-of-the-art self-supervised video encoder for motion understanding and robot manipulation tasks.

v4.52.4-ColQwen2-preview4 features
Jun 2, 2025

This release introduces a preview of the ColQwen2 model, a visual-based document retrieval system that leverages the Qwen2-VL backbone for late interaction similarity scoring.

v4.52.4Breaking4 fixes2 features
May 30, 2025

This patch release focuses on bug fixes for Vision Language Models (Qwen-VL, PaliGemma), attention scaling corrections for OPT, and compatibility improvements for older PyTorch versions.

v4.52.32 fixes
May 22, 2025

This patch release fixes issues related to torch distributed initialization and protects ParallelInterface imports to ensure stability in distributed environments.

v4.52.22 fixes2 features
May 21, 2025

This patch release re-introduces 3D parallel training support while fixing a device map override bug and improving import error clarity.

v4.52.15 features
May 20, 2025

This release introduces several major multimodal and specialized models, including the Qwen2.5-Omni streaming model, the high-precision SAM-HQ segmenter, and the D-FINE real-time object detector.

v4.51.3-CSM-previewBreaking5 features
May 8, 2025

This release introduces the Conversational Speech Model (CSM), an open-source contextual text-to-speech model capable of generating natural speech from multi-turn dialogue context.

v4.51.3-GraniteMoeHybrid-preview3 features
May 8, 2025

This release introduces the GraniteMoeHybrid model architecture, a hybrid design combining state space layers and Mixture-of-Experts (MoE) attention, available as a stable preview ahead of the v4.52.0 minor release.

v4.51.3-D-FINE-preview3 features
May 8, 2025

This release introduces a preview of the D-FINE model, a high-performance real-time object detector featuring fine-grained distribution refinement for superior localization accuracy.

v4.51.3-SAM-HQ-preview5 features
May 8, 2025

This release introduces a preview of SAM-HQ, an enhancement to the Segment Anything Model that provides higher quality segmentation masks with minimal additional parameters.

v4.51.3-BitNet-preview2 features
May 8, 2025

This preview release introduces the BitNet model architecture to the transformers library, enabling high-performance 1-bit LLM inference.

v4.51.3-LlamaGuard-preview5 features
Apr 30, 2025

This release introduces LlamaGuard 4 and Llama Prompt Guard 2, providing multimodal safety moderation for text and images. It is available as a preview tag prior to the official v4.52.0 minor release.

v4.51.3-Qwen2.5-Omni-preview6 features
Apr 24, 2025

This release introduces Qwen2.5-Omni, an end-to-end multimodal model capable of perceiving text, images, audio, and video while generating synchronized text and speech responses.

v4.51.3-InternVL-preview5 features
Apr 22, 2025

This preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.

v4.51.3-Janus-preview5 features
Apr 22, 2025

This release introduces a preview of the Janus and Janus-Pro models, a unified multimodal framework capable of both visual understanding and text-to-image generation by decoupling visual encoding pathways.

v4.51.3-TimesFM-preview3 features
Apr 22, 2025

This release introduces a preview of TimesFM, a decoder-only foundation model for time-series forecasting, available as a specialized tag on top of transformers v4.51.3.

v4.51.3-MLCD-preview3 features
Apr 22, 2025

This release introduces a preview of the MLCD vision model, a foundational visual model optimized for multimodal LLMs like LLaVA, developed by DeepGlint-AI.

v4.51.32 fixes1 feature
Apr 14, 2025

This patch release introduces support for the GLM-4 model and includes several fixes for PyTorch version compatibility, specifically regarding FlexAttention.

v4.51.23 fixes1 feature
Apr 10, 2025

A minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.

v4.51.18 fixes
Apr 8, 2025

This patch release focuses on stabilizing Llama 4 support and fixing compatibility issues with torch 2.6.0, DeepSpeed, and weight initialization.

v4.51.0Breaking9 fixes6 features
Apr 5, 2025

This release introduces support for Llama 4, Phi4-Multimodal, DeepSeek-v3, and Qwen3 architectures, alongside a major documentation overhaul and modularization of speech models.

v4.50.3-DeepSeek-36 features
Mar 28, 2025

This release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.

v4.50.33 fixes
Mar 28, 2025

This patch release fixes bugs related to beam search output cropping, BLIP-2 floating-point precision mismatches, and PixtralProcessor configuration.

v4.50.22 fixes1 feature
Mar 27, 2025

A patch release focusing on backend stability, specifically fixing image processing for Gemma3 and Qwen2-VL, and updating torch version validation.

v4.50.14 fixes
Mar 25, 2025

A patch release addressing minor bugs in hub kernels, remote code, and specific model implementations like Chameleon and PyTorch deformable attention.

v4.50.01 fix7 features
Mar 21, 2025

Release v4.50.0 introduces a new model-based release strategy and adds support for several major vision-language models including Gemma 3, Aya Vision, Mistral 3.1, and SigLIP-2.

v4.49.0-Mistral-37 features
Mar 18, 2025

This release introduces Mistral 3 (Mistral Small 3.1) to the Transformers library, a 24B parameter model featuring 128k context length and advanced vision-language capabilities.

v4.49.0-Gemma-36 features
Mar 18, 2025

This release introduces Google's Gemma 3 multimodal models to the transformers library, featuring a SigLIP vision encoder and Gemma 2 language decoder with support for high-resolution image cropping and multi-image inference.

v4.49.0-AyaVision4 features
Mar 4, 2025

This release introduces Aya Vision 8B and 32B, multilingual multimodal models combining SigLIP-2 vision encoders with Cohere language models, available via a specialized transformers release tag.

v4.49.0-SigLIP-2Breaking5 features
Feb 21, 2025

This release introduces SigLIP-2, a new family of multilingual vision-language encoders featuring improved semantic understanding and support for native aspect ratio image processing via the NaFlex variant.

v4.49.0-SmolVLM-24 features
Feb 20, 2025

This release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.

v4.49.0Breaking2 fixes12 features
Feb 17, 2025

This release introduces several new models including Helium, Qwen2.5-VL, and Zamba2, alongside a new CLI chat feature and standardized fast image processors.

v4.48.34 fixes
Feb 7, 2025

This patch release primarily addresses Python 3.9 compatibility issues, fixes device failures in the RoPE module, and resolves generation bugs for PaliGemma2.

v4.48.2Breaking5 fixes
Jan 30, 2025

This patch release primarily restores Python 3.9 compatibility and fixes regressions related to DBRX model loading and HybridCache mask slicing.

v4.48.13 fixes
Jan 20, 2025

Patch release v4.48.1 fixes a typo in Phi model attention bias, resolves a logic error in gradient accumulation loss, and patches Moonshine's generate wrapper.

v4.48.08 features
Jan 10, 2025

This release introduces several major model architectures including ModernBERT, Aria (MoE), and Bamba (Mamba-2), while adding a TimmWrapper to integrate timm library models directly into the Transformers ecosystem.