Change8

b8140

📦 llama-cppView on GitHub →
3 features🐛 4 fixes🔧 4 symbols

Summary

This release focuses heavily on internal refactoring within the hexagon backend, optimizing various Ops by using local context structs and rewriting ROPE for better DMA/VTCM utilization, leading to minor performance gains. Snapdragon builds also received updates to support larger ubatches.

✨ New Features

  • Refactored all hexagon Ops (set/get/sum-rows, ROPE, Softmax, activation, unary ops) to use a local context struct for performance improvements via precomputation.
  • Rewrote ROPE operation to utilize DMA and VTCM scratchpad, allowing for multi-row fetch/compute.
  • Updated Snapdragon builds to support larger ubatches (size 256).

🐛 Bug Fixes

  • Removed unused fields from op_context.
  • Removed dependency on fastdiv in hex-rope implementation.
  • Cleaned up supported type/dims checks in hexagon.
  • Replicated all reduce functions across lanes, removing the need to explicitly replicate the first value.

Affected Symbols