b8557

📅 Mar 27, 2026📦 llama-cppView on GitHub →

✨ 4 features🐛 2 fixes🔧 5 symbols

Summary

This release introduces significant enhancements to the Hexagon backend by adding support for IQ4_NL and MXFP4 quantization types, alongside fixes for mixed-quant models and code formatting.

✨ New Features

Added support for IQ4_NL quantization type in the Hexagon backend, including buffer set/get tensor repack, mul_mat, and mul_mat_id dispatch.
Implemented HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with LUT-based 4-bit index to int8 kvalue dequantization.
Added MXFP4 HMX dequantization path with E8M0 scale conversion, including a batch-4 fast path and single-tile fallback.
Unified quantized row size / scale offset logic to handle Q4_0, Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path.

🐛 Bug Fixes

Fixed SKIP_QUANTIZE src1 address mismatch issue when processing mixed-quant models on ggml-hexagon.
Fixed the pragma indent.

Affected Symbols

ggml-hexagon mul_mat mul_mat_id HVX IQ4_NL vec_dot kernels DMA fetch path