Change8

b8578

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 3 symbols

Summary

This release focuses on DMA optimizations within the hexagon backend, primarily fixing performance regressions by introducing a mask cache and disabling unnecessary in-order descriptor processing.

✨ New Features

  • Added a simple DMA cache for Mask in hex-fa to avoid repeated mask row refetching.

🐛 Bug Fixes

  • Unset the in-order descriptor bit in hex-dma, which was causing a significant performance regression (3-4 TPS during token generation).

Affected Symbols