Change8

b8797

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 3 symbols

Summary

This release focuses heavily on optimizing Hexagon (HMX) performance by introducing asynchronous workers and queues to overlap computation stages, alongside fixing a race condition in the worker drain mechanism.

Migration Steps

  1. If relying on the previous synchronous HMX calls, be aware they are now asynchronous via the new hmx-queue mechanism.

✨ New Features

  • Introduced an async HMX worker (dedicated thread) to overlap HMX matmul with HVX dequant/DMA stages, replacing synchronous HMX calls.
  • Replaced the hmx-worker with hmx-queue in hex-hmx, which mimics the dma-queue interface for simplification and reduced thread wakeup roundtrips.

🐛 Bug Fixes

  • Fixed a futex race condition in hmx_worker_drain by storing the boolean to a local variable to avoid loading it atomically twice.

Affected Symbols