b9498
📦 llama-cppView on GitHub →
✨ 4 features🐛 1 fixes🔧 1 symbols
Summary
This release focuses heavily on extending and improving RVV quantization support within ggml-cpu, adding higher VLEN implementations for various quantization schemes. It also provides updated pre-built binaries for numerous platforms.
✨ New Features
- Extended ggml-cpu RVV quantization vector dot implementation to support higher VLENs.
- Added RVV 512-bit and 1024-bit implementations for iq4_xs quantization in ggml-cpu.
- Added RVV 512-bit and 1024-bit implementations for q6_K and i-quants in ggml-cpu.
- Added 512-bit and 1024-bit implementations for tq3_s, iq3_xxs, iq2_s, iq2_xs, and iq2_xxs in ggml-cpu.
🐛 Bug Fixes
- Improved iq2_xs implementation for RVV 256-bit.