Change8

b8754

📦 llama-cppView on GitHub →
10 features🐛 8 fixes1 deprecations🔧 16 symbols

Summary

This release significantly improves hexagon performance through op request batching, buffer management rewrite, and explicit L2 cache control. It also removes the deprecated GGML_HEXAGON_EXPERIMENTAL environment variable.

Migration Steps

  1. If you were using the GGML_HEXAGON_EXPERIMENTAL environment variable, switch to using GGML_HEXAGON_OPFILTER to disable specific Ops if necessary.

✨ New Features

  • Introduced op request batching in hexagon, where the host prepares batches of requests dispatched via a single dspqueue message.
  • Rewrote buffer management in hexagon, mapping buffers explicitly by NPU while processing batches.
  • Added support for allocating shared/pinned buffers for opreqs.
  • Made opbatches configurable.
  • Implemented shared buffer usage for packing opbatches.
  • Added support for vmem limit for op batching.
  • Added support for dynamic mmap/unmap on the HTP side for opbatching.
  • Introduced internal types in hex-ops and temporarily disabled src1 reuse.
  • Moved request batch handling into the session for cleaner dspqueue buffer usage.
  • Added super simple opfilter regex for debugging hexagon ops.

🐛 Bug Fixes

  • Disabled l2 bypass in hex-dma to work around an issue caused by missing flushes between Ops.
  • Fixed errors in hex-opreq debug messages.
  • Reverted a change that prevented flushing or invalidating cache lines beyond buffer boundary.
  • Fixed softmax for non-aligned tensors and cleaned up vtcm allocation in hexagon.
  • Fixed src1 handling in act ops and fixed empty src1 handling in swiglu and friends.
  • Fixed minor vtcm and dma handling in matmul.
  • Fixed hvx fallback path in hex-mm.
  • Fixed an issue with newer llvm merging repack and non-repack functions by using pointer differentiation for buffer types.

Affected Symbols

⚡ Deprecations

  • The GGML_HEXAGON_EXPERIMENTAL environment variable is removed as it is no longer useful. Use GGML_HEXAGON_OPFILTER instead to disable Ops for debugging or validation.