v4.51.2
📦 transformersView on GitHub →
✨ 1 features🐛 3 fixes🔧 4 symbols
Summary
A minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.
✨ New Features
- Added support for Attention Quantization with FBGemm and Tensor Parallelism (TP).
🐛 Bug Fixes
- Fixed Llama4 offset calculation.
- Updated Llama4 to use rms_norm_eps for L2Norm.
- Marked Llama4 as explicitly not supported with Flash Attention 2 (FA2).
🔧 Affected Symbols
Llama4L2NormFBGemmFlashAttention2