b9470

📅 Jun 2, 2026📦 llama-cppView on GitHub →

✨ 4 features🐛 5 fixes🔧 7 symbols

Summary

This release focuses heavily on cleanup and performance optimizations across Hexagon (HEX) and HMX backends for matrix multiplication (MUL_MAT, MUL_MAT_ID), Flash Attention, and GDN, introducing initial F32 matmul support and fixing several fusion and stride bugs.

✨ New Features

Initial support for F32 * F32 -> F32 matmuls in hex-mm.
Added support for F32 * F32 -> F32 matmul_2d on HMX using Q4_0 dequantization to F16.
Re-introduced a more generic pipelined vs non-pipelined mode for hmx-mm.
Initial version of MAT_MUL_ID support for HMX.

🐛 Bug Fixes

Fixed src1 stride use in fused rms_norm_mul in hex-rms-norm.
Cleared spad pointers in ops that clobber it, fixing failures in fused rms-norm-mul for qwen3.5-2B at specific batch sizes.
Fixed mxfp4 handling for MUL_MAT_ID in hmx-mm.
Fixed a bug in fusion logic that was messing up the order of the src tensors when some srcs are empty in hex-ops.
Correctly fallback to HVX in hex-fa if sinks are present or dimensions are not quite right.

Affected Symbols

hex-mm hex-rms-norm hex-ops hmx-mm hmx-fa hex-gdn hvx-utils/fa