b9580

📅 Jun 9, 2026📦 llama-cppView on GitHub →

✨ 2 features🔧 3 symbols

Summary

This release introduces significant performance enhancements for Vulkan backends by adding support for the `v_dot2_f32_f16` extension in matrix multiplication and Flash Attention kernels. Several platform builds were disabled pending further updates.

Migration Steps

If using Vulkan builds, ensure the necessary hardware/drivers support the valve fp16 dot2 extension for optimal performance.
Review code paths that relied on previous preprocessor branching logic for dot product calculations, as this has been abstracted.

✨ New Features

Added support for `v_dot2_f32_f16` in Vulkan matrix-matrix multiplication and Flash Attention.
Introduced abstraction for dot product to reduce preprocessor branching in Vulkan implementation.

Affected Symbols

vulkan matrix-matrix multiplication Flash Attention