b8754
📦 llama-cppView on GitHub →
✨ 10 features🐛 8 fixes⚡ 1 deprecations🔧 16 symbols
Summary
This release significantly improves hexagon performance through op request batching, buffer management rewrite, and explicit L2 cache control. It also removes the deprecated GGML_HEXAGON_EXPERIMENTAL environment variable.
Migration Steps
- If you were using the GGML_HEXAGON_EXPERIMENTAL environment variable, switch to using GGML_HEXAGON_OPFILTER to disable specific Ops if necessary.
✨ New Features
- Introduced op request batching in hexagon, where the host prepares batches of requests dispatched via a single dspqueue message.
- Rewrote buffer management in hexagon, mapping buffers explicitly by NPU while processing batches.
- Added support for allocating shared/pinned buffers for opreqs.
- Made opbatches configurable.
- Implemented shared buffer usage for packing opbatches.
- Added support for vmem limit for op batching.
- Added support for dynamic mmap/unmap on the HTP side for opbatching.
- Introduced internal types in hex-ops and temporarily disabled src1 reuse.
- Moved request batch handling into the session for cleaner dspqueue buffer usage.
- Added super simple opfilter regex for debugging hexagon ops.
🐛 Bug Fixes
- Disabled l2 bypass in hex-dma to work around an issue caused by missing flushes between Ops.
- Fixed errors in hex-opreq debug messages.
- Reverted a change that prevented flushing or invalidating cache lines beyond buffer boundary.
- Fixed softmax for non-aligned tensors and cleaned up vtcm allocation in hexagon.
- Fixed src1 handling in act ops and fixed empty src1 handling in swiglu and friends.
- Fixed minor vtcm and dma handling in matmul.
- Fixed hvx fallback path in hex-mm.
- Fixed an issue with newer llvm merging repack and non-repack functions by using pointer differentiation for buffer types.
Affected Symbols
⚡ Deprecations
- The GGML_HEXAGON_EXPERIMENTAL environment variable is removed as it is no longer useful. Use GGML_HEXAGON_OPFILTER instead to disable Ops for debugging or validation.