Change8

b9519

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 2 symbols

Summary

This release ports the multi-column MMVQ optimization to the SYCL backend for improved performance and fixes a bug where the reorder kernel was not being triggered correctly in ggml-sycl for multi-column operations.

✨ New Features

  • Ported multi-column MMVQ optimization from CUDA backend to SYCL backend, improving weight reading efficiency.
  • Bootstrapped weight reordering on small multi-column batches (ne[1] <= 8) in ggml-sycl to ensure the faster reorder kernel is utilized.

🐛 Bug Fixes

  • Fixed an issue in ggml-sycl where weight reorder was only triggered on single-token mat-vec, causing speculative/MTP verify to run on the slower non-reorder kernel for multi-column mat-vec.

Affected Symbols