Change8

v4.51.2

📦 transformersView on GitHub →
1 features🐛 3 fixes🔧 4 symbols

Summary

A minor patch release focusing on Llama4 model corrections and the introduction of Attention Quantization with FBGemm and Tensor Parallelism.

✨ New Features

  • Added support for Attention Quantization with FBGemm and Tensor Parallelism (TP).

🐛 Bug Fixes

  • Fixed Llama4 offset calculation.
  • Updated Llama4 to use rms_norm_eps for L2Norm.
  • Marked Llama4 as explicitly not supported with Flash Attention 2 (FA2).

🔧 Affected Symbols

Llama4L2NormFBGemmFlashAttention2