Change8

b8609

📦 llama-cppView on GitHub →
1 features🐛 3 fixes🔧 1 symbols

Summary

This release introduces Flash Attention support for CUDA with a head dimension of 512 and includes several fixes related to HIP tile kernels and FA compilation.

✨ New Features

  • Added Flash Attention support for CUDA when the head dimension is 512.

🐛 Bug Fixes

  • Fixed HIP tile kernel build for Head Dimension 512.
  • Fixed HIP tile kernel occupancy for Head Dimension 512 on AMD.
  • Fixed tile Flash Attention compilation.

Affected Symbols