b8317

📅 Mar 13, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 2 fixes🔧 2 symbols

Summary

This release introduces full support and performance optimizations for the GATED_DELTA_NET operation via Vulkan compute shaders. Several fixes were implemented related to shader macros, buffer casting, and Q/K broadcast logic.

✨ New Features

Added support for the fused gated delta net recurrence operation (GATED_DELTA_NET) as a Vulkan compute shader, including support for scalar gate, KDA vector gate, GQA broadcast, multi-token sequences, and permuted q/k inputs.
Implemented performance optimizations for the GATED_DELTA_NET Vulkan shader, including using dp4 hardware intrinsic for vec4 dot products, caching exp(g) in shared memory for KDA path, and fusing decay + rank-1 update.

🐛 Bug Fixes

Fixed Q/K broadcast logic for interleaved head layout to adapt to the convention from #20340 (head_id / rq1 → head_id % neq1).
Ensured correct behavior across all Vulkan configurations by adding explicit FLOAT_TYPE casts for data_q, data_k, and data_g buffer loads.

Affected Symbols

vulkan compute shader GATED_DELTA_NET op