b8685

📅 Apr 7, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 1 symbols

Summary

This release introduces a significant Q8_0 reorder optimization for the SYCL backend, improving performance on Intel Arc hardware, and fixes a bug preventing this optimization from activating for Q8_0 tensors.

✨ New Features

Added Q8_0 reorder optimization for SYCL backend, resulting in up to 3x throughput speedup on Intel Arc GPUs.
The Q8_0 reorder optimization separates scale factors from weight data for coalesced memory access.

🐛 Bug Fixes

Fixed an issue where the Q8_0 reorder optimization was silently skipped because Q8_0 was missing from the type check in ggml_backend_sycl_buffer_init_tensor(), which allocates the necessary reorder flag struct.

Affected Symbols

ggml_backend_sycl_buffer_init_tensor