Change8

b8557

📦 llama-cppView on GitHub →
4 features🐛 2 fixes🔧 5 symbols

Summary

This release introduces significant enhancements to the Hexagon backend by adding support for IQ4_NL and MXFP4 quantization types, alongside fixes for mixed-quant models and code formatting.

✨ New Features

  • Added support for IQ4_NL quantization type in the Hexagon backend, including buffer set/get tensor repack, mul_mat, and mul_mat_id dispatch.
  • Implemented HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with LUT-based 4-bit index to int8 kvalue dequantization.
  • Added MXFP4 HMX dequantization path with E8M0 scale conversion, including a batch-4 fast path and single-tile fallback.
  • Unified quantized row size / scale offset logic to handle Q4_0, Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path.

🐛 Bug Fixes

  • Fixed SKIP_QUANTIZE src1 address mismatch issue when processing mixed-quant models on ggml-hexagon.
  • Fixed the pragma indent.

Affected Symbols