b8685
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 1 symbols
Summary
This release introduces a significant Q8_0 reorder optimization for the SYCL backend, improving performance on Intel Arc hardware, and fixes a bug preventing this optimization from activating for Q8_0 tensors.
✨ New Features
- Added Q8_0 reorder optimization for SYCL backend, resulting in up to 3x throughput speedup on Intel Arc GPUs.
- The Q8_0 reorder optimization separates scale factors from weight data for coalesced memory access.
🐛 Bug Fixes
- Fixed an issue where the Q8_0 reorder optimization was silently skipped because Q8_0 was missing from the type check in ggml_backend_sycl_buffer_init_tensor(), which allocates the necessary reorder flag struct.