b8754

📅 Apr 11, 2026📦 llama-cppView on GitHub →

✨ 10 features🐛 8 fixes⚡ 1 deprecations🔧 16 symbols

Summary

This release significantly improves hexagon performance through op request batching, buffer management rewrite, and explicit L2 cache control. It also removes the deprecated GGML_HEXAGON_EXPERIMENTAL environment variable.

Migration Steps

If you were using the GGML_HEXAGON_EXPERIMENTAL environment variable, switch to using GGML_HEXAGON_OPFILTER to disable specific Ops if necessary.

✨ New Features

Introduced op request batching in hexagon, where the host prepares batches of requests dispatched via a single dspqueue message.
Rewrote buffer management in hexagon, mapping buffers explicitly by NPU while processing batches.
Added support for allocating shared/pinned buffers for opreqs.
Made opbatches configurable.
Implemented shared buffer usage for packing opbatches.
Added support for vmem limit for op batching.
Added support for dynamic mmap/unmap on the HTP side for opbatching.
Introduced internal types in hex-ops and temporarily disabled src1 reuse.
Moved request batch handling into the session for cleaner dspqueue buffer usage.
Added super simple opfilter regex for debugging hexagon ops.

🐛 Bug Fixes

Disabled l2 bypass in hex-dma to work around an issue caused by missing flushes between Ops.
Fixed errors in hex-opreq debug messages.
Reverted a change that prevented flushing or invalidating cache lines beyond buffer boundary.
Fixed softmax for non-aligned tensors and cleaned up vtcm allocation in hexagon.
Fixed src1 handling in act ops and fixed empty src1 handling in swiglu and friends.
Fixed minor vtcm and dma handling in matmul.
Fixed hvx fallback path in hex-mm.
Fixed an issue with newer llvm merging repack and non-repack functions by using pointer differentiation for buffer types.

Affected Symbols

hexagon hex-dma hex-utils hex-opreq htp-opreq hex-l2flush hex-mm hex-ops hex-cumsum hex-bufs hex-buf hex-opbatch hex-naming hex-act hex-mmap ggml_hexagon_shared_buffer

⚡ Deprecations

The GGML_HEXAGON_EXPERIMENTAL environment variable is removed as it is no longer useful. Use GGML_HEXAGON_OPFILTER instead to disable Ops for debugging or validation.