Transformers
Data & ML🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Release History
v5.0.0rc2Breaking8 fixes7 featuresThis release focuses on fixing AutoTokenizer enforcement, optimizing MoE performance with batched implementations, and significantly improving model loading speeds via meta device initialization.
v5.0.0rc1Breaking6 fixes7 featuresThis release introduces major breaking changes including 'auto' as the default dtype, 50GB shard sizes for saving, and mandatory **kwargs in forward methods. It also adds support for new models (FastVLM, Lasr, PaddleOCR-VL) and a new dynamic weight loader for quantization.
v5.0.0rc0Breaking2 fixes6 featuresTransformers v5 introduces a major overhaul of the library, featuring a new dynamic weight loading API and a unified tokenizer backend system to simplify internals and improve performance.
v4.57.32 fixesThis emergency patch fixes a critical bug affecting model loading when local_files_only is set to True and addresses a typo from a previous patch.
v4.57.2Breaking2 fixes3 featuresThis patch release focuses on fixing Mistral tokenizer mappings and Tekken pattern matching, while also correcting a decorator error in the device memory utility.
v4.57.15 fixesThis patch release focuses on fixing dependency parsing issues with Optax and Poetry, alongside stability improvements for FSDP and Python 3.9 support.
v4.57.06 featuresThis release introduces support for several next-generation model architectures, including the high-efficiency Qwen3-Next and Qwen3-VL series, the privacy-focused VaultGemma, and the high-speed Longcat Flash MoE.
v4.56.23 fixes1 featureThis release focuses on bug fixes for Jetmoe and Emu3 models, addresses a getter regression, and improves multi-processing performance for processors.
v4.56.1-Vault-Gemma-preview3 featuresThis release introduces a preview of the Vault-Gemma model, a 1B parameter decoder-only model trained with sequence-level differential privacy.
v4.56.1Breaking6 fixesThis patch release primarily fixes the new 'dtype' argument in pipelines and addresses several model-specific bugs including Llama4 accuracy and SamAttention attribute errors.
v4.56.0-Embedding-Gemma-preview3 featuresThis release introduces a preview of the EmbeddingGemma model, a highly efficient 308M parameter multilingual embedding model optimized for on-device RAG and mobile use cases.
v4.56.01 fix12 featuresThis release introduces several major vision and multimodal models including Dino v3, SAM 2, and Ovis 2, alongside a significant refactor of the caching system to optimize memory for sliding window attention.
v4.55.41 fixThis patch release corrects a technical error in the previous release process to properly apply the fix for issue #40197.
v4.55.3Breaking5 fixesPatch release 4.55.3 focuses on stability improvements for FlashAttention-2 on Ascend NPU, FSDP sharding fixes, and critical bug fixes for GPT-OSS and Mamba models.
v4.55.2Breaking1 fixPatch release 4.55.2 fixes a critical regression in Flash Attention 2 (FA2) generations caused by a missing utility import in version 4.55.1.
v4.55.1Breaking9 fixes2 featuresPatch release 4.55.1 focuses on stabilizing the MXFP4 quantization for GPT-OSS models and resolving device-related bugs across several multimodal models like Idefics and SmolVLM.
4.55.0-GLM-4.5V-preview6 featuresThis release introduces GLM-4.5V, a high-performance multimodal reasoning model based on GLM-4.5-Air, featuring advanced capabilities in image, video, and GUI analysis.
v4.55.0Breaking8 featuresOpenAI released GPT OSS, an open-source (Apache 2.0) MoE model family in 21B and 117B sizes featuring 4-bit MXFP4 quantization and Flash Attention 3 support. These models are optimized for reasoning and agentic tasks, compatible with the new Responses API and standard transformers workflows.
4.54.1Breaking10 fixes2 featuresA maintenance patch release focused on fixing regressions in cache inheritance, device placement, and distributed training (TP/device-mesh) across various model architectures like ModernBERT, GPT2, and Mamba.
v4.54.0Breaking1 fix10 featuresThis release focuses on reducing library bloat and increasing speed through refactored Llama models, megablocks kernel integration, and native distributed training. It also introduces several new model architectures including Ernie 4.5, Voxtral, DeepSeek-V2, and LFM2.
v4.53.2-Ernie-4.5-preview2 featuresThis preview release introduces Baidu's Ernie 4.5 model family to Transformers, including a 0.3B dense model and MoE variants (21B and 300B).
v4.53.31 fixA small patch release (v4.53.3) that refactors OpenTelemetry integration by removing explicit provider setter calls.
v4.53.2-modernbert-decoder-previewBreaking5 featuresThis release introduces a preview of the ModernBERT Decoder, a causal language model variant of the ModernBERT architecture designed for autoregressive generation and sequence classification.
v4.53.2Breaking6 fixes2 featuresThis patch release focuses on critical bug fixes for GLM-4.1V and GLM-4V models, resolves hardware-specific issues on Ascend NPU, and deprecates the sliding window feature.
v4.53.1Breaking7 fixes1 featureThis patch release focuses on bug fixes for Vision Language Models (VLMs) like Qwen2-VL and SmolVLM, alongside introducing packed tensor format support for various attention backends.
v4.53.08 featuresRelease v4.53.0 introduces several major model architectures including Gemma 3n, Dia TTS, Kyutai STT, and the massive 456B MiniMax model. The update focuses heavily on multimodal capabilities, efficient parameter usage, and long-context support.
v4.52.4-Kyutai-STT-preview3 featuresThis release introduces a preview of the Kyutai-STT model architecture, featuring 1B and 2.6B parameter checkpoints for high-accuracy speech-to-text transcription.
v4.52.4-VJEPA-2-preview3 featuresThis release introduces a preview of the V-JEPA 2 model, a state-of-the-art self-supervised video encoder for motion understanding and robot manipulation tasks.
v4.52.4-ColQwen2-preview4 featuresThis release introduces a preview of the ColQwen2 model, a visual-based document retrieval system that leverages the Qwen2-VL backbone for late interaction similarity scoring.
v4.52.4Breaking4 fixes2 featuresThis patch release focuses on bug fixes for Vision Language Models (Qwen-VL, PaliGemma), attention scaling corrections for OPT, and compatibility improvements for older PyTorch versions.
v4.52.32 fixesThis patch release fixes issues related to torch distributed initialization and protects ParallelInterface imports to ensure stability in distributed environments.
v4.52.22 fixes2 featuresThis patch release re-introduces 3D parallel training support while fixing a device map override bug and improving import error clarity.
v4.52.15 featuresThis release introduces several major multimodal and specialized models, including the Qwen2.5-Omni streaming model, the high-precision SAM-HQ segmenter, and the D-FINE real-time object detector.
v4.51.3-CSM-previewBreaking5 featuresThis release introduces the Conversational Speech Model (CSM), an open-source contextual text-to-speech model capable of generating natural speech from multi-turn dialogue context.
v4.51.3-GraniteMoeHybrid-preview3 featuresThis release introduces the GraniteMoeHybrid model architecture, a hybrid design combining state space layers and Mixture-of-Experts (MoE) attention, available as a stable preview ahead of the v4.52.0 minor release.
v4.51.3-D-FINE-preview3 featuresThis release introduces a preview of the D-FINE model, a high-performance real-time object detector featuring fine-grained distribution refinement for superior localization accuracy.
v4.51.3-SAM-HQ-preview5 featuresThis release introduces a preview of SAM-HQ, an enhancement to the Segment Anything Model that provides higher quality segmentation masks with minimal additional parameters.
v4.51.3-BitNet-preview2 featuresThis preview release introduces the BitNet model architecture to the transformers library, enabling high-performance 1-bit LLM inference.
v4.51.3-LlamaGuard-preview5 featuresThis release introduces LlamaGuard 4 and Llama Prompt Guard 2, providing multimodal safety moderation for text and images. It is available as a preview tag prior to the official v4.52.0 minor release.
v4.51.3-Qwen2.5-Omni-preview6 featuresThis release introduces Qwen2.5-Omni, an end-to-end multimodal model capable of perceiving text, images, audio, and video while generating synchronized text and speech responses.
v4.51.3-InternVL-preview5 featuresThis preview release introduces support for the InternVL 2.5 and 3 family of multimodal models, featuring a native multimodal pre-training paradigm and state-of-the-art performance on visual-linguistic tasks.
v4.51.3-Janus-preview5 featuresThis release introduces a preview of the Janus and Janus-Pro models, a unified multimodal framework capable of both visual understanding and text-to-image generation by decoupling visual encoding pathways.
v4.51.3-TimesFM-preview3 featuresThis release introduces a preview of TimesFM, a decoder-only foundation model for time-series forecasting, available as a specialized tag on top of transformers v4.51.3.
v4.51.3-MLCD-preview3 featuresThis release introduces a preview of the MLCD vision model, a foundational visual model optimized for multimodal LLMs like LLaVA, developed by DeepGlint-AI.
v4.51.32 fixes1 featureThis patch release introduces support for the GLM-4 model and includes several fixes for PyTorch version compatibility, specifically regarding FlexAttention.
v4.51.23 fixes1 featureA minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.
v4.51.18 fixesThis patch release focuses on stabilizing Llama 4 support and fixing compatibility issues with torch 2.6.0, DeepSpeed, and weight initialization.
v4.51.0Breaking9 fixes6 featuresThis release introduces support for Llama 4, Phi4-Multimodal, DeepSeek-v3, and Qwen3 architectures, alongside a major documentation overhaul and modularization of speech models.
v4.50.3-DeepSeek-36 featuresThis release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.
v4.50.33 fixesThis patch release fixes bugs related to beam search output cropping, BLIP-2 floating-point precision mismatches, and PixtralProcessor configuration.
v4.50.22 fixes1 featureA patch release focusing on backend stability, specifically fixing image processing for Gemma3 and Qwen2-VL, and updating torch version validation.
v4.50.14 fixesA patch release addressing minor bugs in hub kernels, remote code, and specific model implementations like Chameleon and PyTorch deformable attention.
v4.50.01 fix7 featuresRelease v4.50.0 introduces a new model-based release strategy and adds support for several major vision-language models including Gemma 3, Aya Vision, Mistral 3.1, and SigLIP-2.
v4.49.0-Mistral-37 featuresThis release introduces Mistral 3 (Mistral Small 3.1) to the Transformers library, a 24B parameter model featuring 128k context length and advanced vision-language capabilities.
v4.49.0-Gemma-36 featuresThis release introduces Google's Gemma 3 multimodal models to the transformers library, featuring a SigLIP vision encoder and Gemma 2 language decoder with support for high-resolution image cropping and multi-image inference.
v4.49.0-AyaVision4 featuresThis release introduces Aya Vision 8B and 32B, multilingual multimodal models combining SigLIP-2 vision encoders with Cohere language models, available via a specialized transformers release tag.
v4.49.0-SigLIP-2Breaking5 featuresThis release introduces SigLIP-2, a new family of multilingual vision-language encoders featuring improved semantic understanding and support for native aspect ratio image processing via the NaFlex variant.
v4.49.0-SmolVLM-24 featuresThis release introduces SmolVLM-2, a lightweight vision-language model based on Idefics3 and SmolLM2 that supports multi-image and video processing.
v4.49.0Breaking2 fixes12 featuresThis release introduces several new models including Helium, Qwen2.5-VL, and Zamba2, alongside a new CLI chat feature and standardized fast image processors.
v4.48.34 fixesThis patch release primarily addresses Python 3.9 compatibility issues, fixes device failures in the RoPE module, and resolves generation bugs for PaliGemma2.
v4.48.2Breaking5 fixesThis patch release primarily restores Python 3.9 compatibility and fixes regressions related to DBRX model loading and HybridCache mask slicing.
v4.48.13 fixesPatch release v4.48.1 fixes a typo in Phi model attention bias, resolves a logic error in gradient accumulation loss, and patches Moonshine's generate wrapper.
v4.48.08 featuresThis release introduces several major model architectures including ModernBERT, Aria (MoE), and Bamba (Mamba-2), while adding a TimmWrapper to integrate timm library models directly into the Transformers ecosystem.