b8797

📅 Apr 15, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 3 symbols

Summary

This release focuses heavily on optimizing Hexagon (HMX) performance by introducing asynchronous workers and queues to overlap computation stages, alongside fixing a race condition in the worker drain mechanism.

Migration Steps

If relying on the previous synchronous HMX calls, be aware they are now asynchronous via the new hmx-queue mechanism.

✨ New Features

Introduced an async HMX worker (dedicated thread) to overlap HMX matmul with HVX dequant/DMA stages, replacing synchronous HMX calls.
Replaced the hmx-worker with hmx-queue in hex-hmx, which mimics the dma-queue interface for simplification and reduced thread wakeup roundtrips.

🐛 Bug Fixes

Fixed a futex race condition in hmx_worker_drain by storing the boolean to a local variable to avoid loading it atomically twice.

Affected Symbols

hmx_worker hmx_queue hmx_worker_drain