b8609

📅 Apr 1, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 3 fixes🔧 1 symbols

Summary

This release introduces Flash Attention support for CUDA with a head dimension of 512 and includes several fixes related to HIP tile kernels and FA compilation.

✨ New Features

Added Flash Attention support for CUDA when the head dimension is 512.

🐛 Bug Fixes

Fixed HIP tile kernel build for Head Dimension 512.
Fixed HIP tile kernel occupancy for Head Dimension 512 on AMD.
Fixed tile Flash Attention compilation.

Affected Symbols

CUDA Flash Attention