v4.51.1
📦 transformersView on GitHub →
🐛 8 fixes🔧 6 symbols
Summary
This patch release focuses on stabilizing Llama 4 support and fixing compatibility issues with torch 2.6.0, DeepSpeed, and weight initialization.
Migration Steps
- Update your library to version v4.51.1 using your package manager (e.g., pip install --upgrade llama-package).
- If you were experiencing issues related to flex attention with torch=2.6.0, these should now be resolved.
- If you were using HQQ with the caching allocator warmup, note that HQQ has been removed from this specific warmup process; review your initialization logic if this affects performance or startup time.
- If you encountered issues initializing weights when not using the accelerate library, these fixes should resolve them.
🐛 Bug Fixes
- Fixed flex attention compatibility for torch 2.6.0
- Resolved issues with post-training and general training for Llama 4
- Removed HQQ from caching allocator warmup
- Fixed _init_weights for derived BERT models
- Fixed initialization of empty weights when accelerate is not present
- Fixed DeepSpeed integration with quantization
- Fixed flex attention when optional arguments are omitted
- General stability fixes for Llama 4
🔧 Affected Symbols
flex_attentionLlama4BertPreTrainedModel._init_weightsinit_empty_weightsDeepSpeedEngineHQQ