b8609
📦 llama-cppView on GitHub →
✨ 1 features🐛 3 fixes🔧 1 symbols
Summary
This release introduces Flash Attention support for CUDA with a head dimension of 512 and includes several fixes related to HIP tile kernels and FA compilation.
✨ New Features
- Added Flash Attention support for CUDA when the head dimension is 512.
🐛 Bug Fixes
- Fixed HIP tile kernel build for Head Dimension 512.
- Fixed HIP tile kernel occupancy for Head Dimension 512 on AMD.
- Fixed tile Flash Attention compilation.