b8797
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 3 symbols
Summary
This release focuses heavily on optimizing Hexagon (HMX) performance by introducing asynchronous workers and queues to overlap computation stages, alongside fixing a race condition in the worker drain mechanism.
Migration Steps
- If relying on the previous synchronous HMX calls, be aware they are now asynchronous via the new hmx-queue mechanism.
✨ New Features
- Introduced an async HMX worker (dedicated thread) to overlap HMX matmul with HVX dequant/DMA stages, replacing synchronous HMX calls.
- Replaced the hmx-worker with hmx-queue in hex-hmx, which mimics the dma-queue interface for simplification and reduced thread wakeup roundtrips.
🐛 Bug Fixes
- Fixed a futex race condition in hmx_worker_drain by storing the boolean to a local variable to avoid loading it atomically twice.