b8140
📦 llama-cppView on GitHub →
✨ 3 features🐛 4 fixes🔧 4 symbols
Summary
This release focuses heavily on internal refactoring within the hexagon backend, optimizing various Ops by using local context structs and rewriting ROPE for better DMA/VTCM utilization, leading to minor performance gains. Snapdragon builds also received updates to support larger ubatches.
✨ New Features
- Refactored all hexagon Ops (set/get/sum-rows, ROPE, Softmax, activation, unary ops) to use a local context struct for performance improvements via precomputation.
- Rewrote ROPE operation to utilize DMA and VTCM scratchpad, allowing for multi-row fetch/compute.
- Updated Snapdragon builds to support larger ubatches (size 256).
🐛 Bug Fixes
- Removed unused fields from op_context.
- Removed dependency on fastdiv in hex-rope implementation.
- Cleaned up supported type/dims checks in hexagon.
- Replicated all reduce functions across lanes, removing the need to explicitly replicate the first value.