b8143

📅 Feb 24, 2026📦 llama-cppView on GitHub →

✨ 18 features🐛 11 fixes🔧 2 symbols

Summary

This release focuses heavily on refactoring and optimizing the Vulkan Scalar Flash Attention implementation, introducing fp16 support, improving synchronization, and applying numerous hardware-specific tuning fixes across AMD, Intel, and Nvidia platforms.

Migration Steps

If using scalar FA with Bc=4, this configuration is now invalid and must be changed.
Users on GCN AMD GPUs using the proprietary driver should note that f16 FA is now disabled.

✨ New Features

Vulkan Scalar Flash Attention Refactor implemented.
Enabled using fp16 in scalar flash attention shader.
Implemented splitting rows inside of subgroups for faster synchronization in Vulkan FA.
Added support for using f32 scalar FA when f16 is not supported by the device.
Added medium rows FA shader Br size support.
Cached q values into registers for KQ computation.
Fused lf accumulation, pf, and v accumulation into a single loop.
Enabled staging K and V loads through shared memory (shmem) (only on Nvidia for V staging).
Defaulted Bc to 32 for scalar FA.
Enabled dynamic subgroups for Intel devices.
Used vectorized stores.
Used float_type for dequantize4 functions.
Used smaller scalar rows size for smaller rows count.
Relaxed flash attention split_k condition to allow non-gqa use.
Used minimal subgroup size on Intel.
Added Intel shader core count lookup-table.
Allowed printing pipeline stats.
Limited occupancy for GCN for small batch FA with large HSK.

🐛 Bug Fixes

Fixed AMD workgroup size issue in Vulkan FA.
Optimized masksh use.
Added padding to mask shmem buffer.
Fixed issue where Bc 4 for scalar FA was an invalid configuration.
Used wave32 on AMD RDNA for scalar FA.
Fixed rebase issues.
Fixed gqa opt logic.
Fixed block_rows issue with small n_rows.
Fixed hsk=72/80 issue.
Fixed bad RDNA performance on head size <= 128 by limiting occupancy.
Disabled f16 FA for GCN AMD GPUs on the proprietary driver.

Affected Symbols

scalar flash attention shader dequantize4 functions