b8779

📅 Apr 13, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 3 fixes🔧 4 symbols

Summary

This release introduces Vulkan Flash Attention DP4A support for quantized KV caches using integer dot products and includes several fixes related to indexing and quantization checks in the Vulkan backend.

✨ New Features

Implemented Vulkan Flash Attention DP4A shader for quantized KV cache using integer dot product.
Added support for more KV type quants in Vulkan implementation.
Readded fast paths for quantizations smaller than 8bit.

🐛 Bug Fixes

Fixed SHMEM_STAGING indexing in Vulkan implementation.
Fixed issues related to mmq gate and shmem checks in Vulkan implementation.
Added supported quants checks to Flash Attention tests.

Affected Symbols

Vulkan Flash Attention DP4A shader SHMEM_STAGING mmq gate shmem checks