b8557
📦 llama-cppView on GitHub →
✨ 4 features🐛 2 fixes🔧 5 symbols
Summary
This release introduces significant enhancements to the Hexagon backend by adding support for IQ4_NL and MXFP4 quantization types, alongside fixes for mixed-quant models and code formatting.
✨ New Features
- Added support for IQ4_NL quantization type in the Hexagon backend, including buffer set/get tensor repack, mul_mat, and mul_mat_id dispatch.
- Implemented HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with LUT-based 4-bit index to int8 kvalue dequantization.
- Added MXFP4 HMX dequantization path with E8M0 scale conversion, including a batch-4 fast path and single-tile fallback.
- Unified quantized row size / scale offset logic to handle Q4_0, Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path.
🐛 Bug Fixes
- Fixed SKIP_QUANTIZE src1 address mismatch issue when processing mixed-quant models on ggml-hexagon.
- Fixed the pragma indent.