b8140

📅 Feb 24, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 4 fixes🔧 4 symbols

Summary

This release focuses heavily on internal refactoring within the hexagon backend, optimizing various Ops by using local context structs and rewriting ROPE for better DMA/VTCM utilization, leading to minor performance gains. Snapdragon builds also received updates to support larger ubatches.

✨ New Features

Refactored all hexagon Ops (set/get/sum-rows, ROPE, Softmax, activation, unary ops) to use a local context struct for performance improvements via precomputation.
Rewrote ROPE operation to utilize DMA and VTCM scratchpad, allowing for multi-row fetch/compute.
Updated Snapdragon builds to support larger ubatches (size 256).

🐛 Bug Fixes

Removed unused fields from op_context.
Removed dependency on fastdiv in hex-rope implementation.
Cleaned up supported type/dims checks in hexagon.
Replicated all reduce functions across lanes, removing the need to explicitly replicate the first value.

Affected Symbols

hexagon Ops op_context hex-rope fastdiv