Change8

b8685

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 1 symbols

Summary

This release introduces a significant Q8_0 reorder optimization for the SYCL backend, improving performance on Intel Arc hardware, and fixes a bug preventing this optimization from activating for Q8_0 tensors.

✨ New Features

  • Added Q8_0 reorder optimization for SYCL backend, resulting in up to 3x throughput speedup on Intel Arc GPUs.
  • The Q8_0 reorder optimization separates scale factors from weight data for coalesced memory access.

🐛 Bug Fixes

  • Fixed an issue where the Q8_0 reorder optimization was silently skipped because Q8_0 was missing from the type check in ggml_backend_sycl_buffer_init_tensor(), which allocates the necessary reorder flag struct.

Affected Symbols