Change8

b9498

📦 llama-cppView on GitHub →
4 features🐛 1 fixes🔧 1 symbols

Summary

This release focuses heavily on extending and improving RVV quantization support within ggml-cpu, adding higher VLEN implementations for various quantization schemes. It also provides updated pre-built binaries for numerous platforms.

✨ New Features

  • Extended ggml-cpu RVV quantization vector dot implementation to support higher VLENs.
  • Added RVV 512-bit and 1024-bit implementations for iq4_xs quantization in ggml-cpu.
  • Added RVV 512-bit and 1024-bit implementations for q6_K and i-quants in ggml-cpu.
  • Added 512-bit and 1024-bit implementations for tq3_s, iq3_xxs, iq2_s, iq2_xs, and iq2_xxs in ggml-cpu.

🐛 Bug Fixes

  • Improved iq2_xs implementation for RVV 256-bit.

Affected Symbols