Change8

v4.51.1

📦 transformersView on GitHub →
🐛 8 fixes🔧 6 symbols

Summary

This patch release focuses on stabilizing Llama 4 support and fixing compatibility issues with torch 2.6.0, DeepSpeed, and weight initialization.

Migration Steps

  1. Update your library to version v4.51.1 using your package manager (e.g., pip install --upgrade llama-package).
  2. If you were experiencing issues related to flex attention with torch=2.6.0, these should now be resolved.
  3. If you were using HQQ with the caching allocator warmup, note that HQQ has been removed from this specific warmup process; review your initialization logic if this affects performance or startup time.
  4. If you encountered issues initializing weights when not using the accelerate library, these fixes should resolve them.

🐛 Bug Fixes

  • Fixed flex attention compatibility for torch 2.6.0
  • Resolved issues with post-training and general training for Llama 4
  • Removed HQQ from caching allocator warmup
  • Fixed _init_weights for derived BERT models
  • Fixed initialization of empty weights when accelerate is not present
  • Fixed DeepSpeed integration with quantization
  • Fixed flex attention when optional arguments are omitted
  • General stability fixes for Llama 4

🔧 Affected Symbols

flex_attentionLlama4BertPreTrainedModel._init_weightsinit_empty_weightsDeepSpeedEngineHQQ