Change8

b8317

📦 llama-cppView on GitHub →
2 features🐛 2 fixes🔧 2 symbols

Summary

This release introduces full support and performance optimizations for the GATED_DELTA_NET operation via Vulkan compute shaders. Several fixes were implemented related to shader macros, buffer casting, and Q/K broadcast logic.

✨ New Features

  • Added support for the fused gated delta net recurrence operation (GATED_DELTA_NET) as a Vulkan compute shader, including support for scalar gate, KDA vector gate, GQA broadcast, multi-token sequences, and permuted q/k inputs.
  • Implemented performance optimizations for the GATED_DELTA_NET Vulkan shader, including using dp4 hardware intrinsic for vec4 dot products, caching exp(g) in shared memory for KDA path, and fusing decay + rank-1 update.

🐛 Bug Fixes

  • Fixed Q/K broadcast logic for interleaved head layout to adapt to the convention from #20340 (head_id / rq1 → head_id % neq1).
  • Ensured correct behavior across all Vulkan configurations by adding explicit FLOAT_TYPE casts for data_q, data_k, and data_g buffer loads.

Affected Symbols