Change8

v4.50.3-DeepSeek-3

📦 transformers
6 features🔧 4 symbols

Summary

This release introduces support for the DeepSeek-V3 (DeepSeek-R1) model, featuring MLA and DeepSeekMoE architectures, available via a specific git tag on top of version 4.50.3.

Migration Steps

  1. Install the specific version using: pip install git+https://github.com/huggingface/transformers@v4.50.3-DeepSeek-3

✨ New Features

  • Added support for DeepSeek-V3 (also known as DeepSeek-R1) model.
  • Implementation of Multi-head Latent Attention (MLA) architecture.
  • Implementation of DeepSeekMoE architecture with 671B total parameters and 37B activated parameters.
  • Support for auxiliary-loss-free load balancing strategy.
  • Support for multi-token prediction training objective.
  • Native support for running the model in FP8 precision.

🔧 Affected Symbols

AutoModelForCausalLMAutoTokenizerDeepSeek-V3DeepSeek-R1