b8578
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 3 symbols
Summary
This release focuses on DMA optimizations within the hexagon backend, primarily fixing performance regressions by introducing a mask cache and disabling unnecessary in-order descriptor processing.
✨ New Features
- Added a simple DMA cache for Mask in hex-fa to avoid repeated mask row refetching.
🐛 Bug Fixes
- Unset the in-order descriptor bit in hex-dma, which was causing a significant performance regression (3-4 TPS during token generation).