v5.8.0
Breaking Changes📦 transformersView on GitHub →
⚠ 1 breaking✨ 6 features🐛 10 fixes🔧 5 symbols
Summary
Version 5.8.0 introduces support for several new models including DeepSeek V4, Gemma 4 Assistant, and various vision/speech models. This release also removes Apex integration, requiring migration to native PyTorch equivalents for related functionality.
⚠️ Breaking Changes
- Apex integration has been removed from the library (including RMSNorm usage in T5 and related models). Users relying on Apex for mixed precision or fused ops should migrate to PyTorch's native equivalents instead.
Migration Steps
- Migrate from Apex for mixed precision or fused ops to PyTorch's native equivalents.
✨ New Features
- Added DeepSeek V4 models (DeepSeek-V4-Flash, DeepSeek-V4-Pro, and -Base variants) featuring a new MoE architecture.
- Added Gemma 4 Assistant model for speculative decoding using Multi-Token Prediction (MTP).
- Added GraniteSpeechPlus, a multimodal speech-to-text model enhancing the projector by consuming concatenated intermediate hidden states.
- Added Granite4Vision (Granite Vision 4.1), a vision-language model for document data extraction specializing in chart and table extraction.
- Added EXAONE 4.5, an open-weight vision language model with expanded context window support (up to 256K tokens) and MTP mechanism.
- Added PP-FormulaNet-L and PP-FormulaNet_plus-L models for table structure recognition and image-to-text tasks involving mathematical formulas.
🐛 Bug Fixes
- Fixed tokenizer mapping issues for DeepSeek R1 distilled (Qwen2) and DeepSeek OCR models.
- Resolved a significant performance regression in PreTrainedTokenizer.convert_ids_to_tokens when skip_special_tokens=True, resulting in a ~300x speedup.
- Corrected spelling in continuous_api docstring.
- Fixed link to modular transformers documentation.
- Fixed failed test cases for Gemma4.
- Unwrapped text_config in AutoModelFor*.from_config.
- Added Mps support in float fallback backends list.
- Fixed conversion script yarn's apply_scale support for MINISTRAL3.
- Ensured _no_reinit flag is respected on dt_bias and out_proj.weight for nemotron_h.
- Resolved backbone utils test regressions.