Change8

b9084

📦 llama-cppView on GitHub →
6 features🔧 2 symbols

Summary

This release introduces significant performance optimizations for Gated Delta Net recurrence on HVX hardware, including specialized kernels for prompt processing and token generation paths. It also provides updated pre-compiled binaries across numerous operating systems and hardware configurations.

✨ New Features

  • Added HTP kernel support for GGML_OP_GATED_DELTA_NET operation.
  • Implemented 4-row fused kernels for the Prompt Processing (PP) path of Gated Delta Net on HVX.
  • Implemented 8-row fused kernels for the Token Generation (TG) path of Gated Delta Net on HVX, resulting in 2x reduction in K/Q/gate vector reload overhead.
  • Introduced separate PP and TG thread functions for I-cache isolation.
  • Added VTCM state scratchpad with DMA in/out for TG single-cycle access.
  • Implemented vectorized gate exp via hvx_exp_f32.

Affected Symbols