b8495
📦 llama-cppView on GitHub →
✨ 5 features🐛 5 fixes🔧 4 symbols
Summary
This release focuses on general DMA and Binary Operation fixes for the Hexagon backend, improving handling of large strides and resolving issues in ssm-conv. It also introduces platform-specific binary updates for broad compatibility.
Migration Steps
- If relying on specific VTCM allocation behavior for ssm-conv, be aware that single-page allocation is now the default.
- If using hex-dma with large strides, the stride limitation is removed on v75+ due to the switch to 2d-wide mode.
✨ New Features
- Chained DMA is now the default for hex-dma to support newer models.
- Added uint32 dump helper for hexagon backend.
- Hexagon now uses single-page VTCM allocation to resolve issues with large gather operations in ssm-conv.
- Hex-dma now uses 1d mode for reshaping, supporting sizes up to 24-bits (>16MB).
- Hex-dma starts using 2d-wide mode on v75 and up, removing the 16-bit stride limitation.
🐛 Bug Fixes
- Fixed incorrect stride logic in hex-bin.
- Ensured repack buffers are dumped for verbose level > 2.
- Hex-bin now consistently uses dma_queue_push even for dummy destination transactions.
- Hex-bin cleanup of kernel selection logic.
- Hex-bin cleanup of binary op core and fix for transposed tensor handling.