b8578

📅 Mar 29, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 1 fixes🔧 3 symbols

Summary

This release focuses on DMA optimizations within the hexagon backend, primarily fixing performance regressions by introducing a mask cache and disabling unnecessary in-order descriptor processing.

✨ New Features

Added a simple DMA cache for Mask in hex-fa to avoid repeated mask row refetching.

🐛 Bug Fixes

Unset the in-order descriptor bit in hex-dma, which was causing a significant performance regression (3-4 TPS during token generation).

Affected Symbols

hex-fa hex-dma hex-rope