b8317
📦 llama-cppView on GitHub →
✨ 2 features🐛 2 fixes🔧 2 symbols
Summary
This release introduces full support and performance optimizations for the GATED_DELTA_NET operation via Vulkan compute shaders. Several fixes were implemented related to shader macros, buffer casting, and Q/K broadcast logic.
✨ New Features
- Added support for the fused gated delta net recurrence operation (GATED_DELTA_NET) as a Vulkan compute shader, including support for scalar gate, KDA vector gate, GQA broadcast, multi-token sequences, and permuted q/k inputs.
- Implemented performance optimizations for the GATED_DELTA_NET Vulkan shader, including using dp4 hardware intrinsic for vec4 dot products, caching exp(g) in shared memory for KDA path, and fusing decay + rank-1 update.
🐛 Bug Fixes
- Fixed Q/K broadcast logic for interleaved head layout to adapt to the convention from #20340 (head_id / rq1 → head_id % neq1).
- Ensured correct behavior across all Vulkan configurations by adding explicit FLOAT_TYPE casts for data_q, data_k, and data_g buffer loads.