v4.51.2

📅 Apr 10, 2025📦 transformersView on GitHub →

✨ 1 features🐛 3 fixes🔧 4 symbols

Summary

A minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.

✨ New Features

Added support for Attention Quantization with FBGemm and Tensor Parallelism (TP).

🐛 Bug Fixes

Fixed Llama4 offset calculation.
Updated Llama4 to use rms_norm_eps for L2Norm.
Marked Llama4 as explicitly not supported with Flash Attention 2 (FA2).

🔧 Affected Symbols

Llama4L2NormFBGemmFlashAttention2