v4.50.3-DeepSeek-3
📦 transformers
✨ 6 features🔧 4 symbols
Summary
This release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.
Migration Steps
- Install the specific version using: pip install git+https://github.com/huggingface/transformers@v4.50.3-DeepSeek-3
✨ New Features
- Added support for DeepSeek-V3 (also known as DeepSeek-R1) model.
- Implementation of Multi-head Latent Attention (MLA) architecture.
- Implementation of DeepSeekMoE architecture with 671B total parameters and 37B activated parameters.
- Support for auxiliary-loss-free load balancing strategy.
- Support for multi-token prediction training objective.
- Native support for running the model in FP8 precision.
🔧 Affected Symbols
AutoModelForCausalLMAutoTokenizerDeepSeek-V3DeepSeek-R1