b9519

📅 Jun 5, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 2 symbols

Summary

This release ports the multi-column MMVQ optimization to the SYCL backend for improved performance and fixes a bug where the reorder kernel was not being triggered correctly in ggml-sycl for multi-column operations.

✨ New Features

Ported multi-column MMVQ optimization from CUDA backend to SYCL backend, improving weight reading efficiency.
Bootstrapped weight reordering on small multi-column batches (ne[1] <= 8) in ggml-sycl to ensure the faster reorder kernel is utilized.

🐛 Bug Fixes

Fixed an issue in ggml-sycl where weight reorder was only triggered on single-token mat-vec, causing speculative/MTP verify to run on the slower non-reorder kernel for multi-column mat-vec.

Affected Symbols

ggml-cuda/mmvq.cu ggml-sycl