b8853
📦 llama-cppView on GitHub →
🐛 2 fixes🔧 2 symbols
Summary
This release fixes a critical assertion failure in SYCL kernels related to vocabulary size alignment by implementing padding logic, and clarifies subgroup size usage in MMVQ launches.
🐛 Bug Fixes
- Fixed an assertion failure in SYCL reorder MMVQ dispatchers (Q4_0, Q8_0, Q4_K, Q6_K) that occurred when models had a vocabulary size not divisible by 16, by replacing the assert with padding logic.
- Replaced the hardcoded subgroup size of 16 with WARP_SIZE in the four reorder_mul_mat_vec launch helpers for SYCL kernels (Q4_0, Q8_0, Q4_K, Q6_K) to make the dependency on subgroup size explicit.