v4.50.3-DeepSeek-3

📅 Mar 28, 2025📦 transformersView on GitHub →

✨ 6 features🔧 4 symbols

Summary

This release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.

Migration Steps

Install the specific version using: pip install git+https://github.com/huggingface/transformers@v4.50.3-DeepSeek-3

✨ New Features

Added support for DeepSeek-V3 (also known as DeepSeek-R1) model.
Implementation of Multi-head Latent Attention (MLA) architecture.
Implementation of DeepSeekMoE architecture with 671B total parameters and 37B activated parameters.
Support for auxiliary-loss-free load balancing strategy.
Support for multi-token prediction training objective.
Native support for running the model in FP8 precision.

Affected Symbols

AutoModelForCausalLM AutoTokenizer DeepSeek-V3 DeepSeek-R1