b9784

Breaking Changes

📅 Jun 24, 2026📦 llama-cppView on GitHub →

⚠ 2 breaking✨ 4 features🐛 8 fixes🔧 11 symbols

Summary

This release introduces a major rework of hexagon matrix multiplication (MUL_MAT/MUL_MAT_ID) with new tiled weight repacking and performance optimizations across HVX and HMX backends. Support for hardware older than architecture v73 has been removed.

⚠️ Breaking Changes

Support for hardware architecture versions older than v73 has been removed because HMX is now required for most use-cases.
The new tiled repack format (renamed from x4x2) is now permanent; older formats are removed.

Migration Steps

Ensure target hardware architecture is v73 or newer, as support for older architectures is dropped.
Update build system to use the new tiled repack format (formerly x4x2) consistently.

✨ New Features

Reworked MUL_MAT and MUL_MAT_ID operations in hexagon backend, including 32x32 tiled weight repack, kernel-params, and cached graphs.
Added support for non-tiled matrix multiplication (mm) as a fallback option in hex-mm.
Added support for simple graph caching to avoid recomputing kernel-params.
Enabled HMX for all builds via CMake update.

🐛 Bug Fixes

Fixed HMX/HVX fallback logic and MUL_MAT_ID allocation, unbreaking OLMoE.
Fixed matmul-id kernel params selection, unbreaking OLMoE and LFM.
Fixed HVX flat fallback to pass all MUL_MAT tests.
Restored pipelined mode in HMX-MM.
Fixed HVX-MM to accumulate in fp32 in tiled kernels for better accuracy and same performance.
Fixed HVX-MM loop unrolling and removed unnecessary masking for tiled accumulators.
Fixed MUL_MAT_ID kernel_param handling to ensure host/NPU synchronization.
Relaxed hardcoded checks for rows being a multiple of 256, now relying on VTCM size requirements.

Affected Symbols

MUL_MAT MUL_MAT_ID hex-mm hvx-mm hmx-mm vec_dots qweight vtcm_weight kernel-params matmul-ops.h GGML_HEXAGON_MM_SELECT