b9291
📦 llama-cppView on GitHub →
✨ 2 features🔧 1 symbols
Summary
This release significantly improves MoE prefill throughput on SYCL by optimizing the expert routing calculation complexity. It also provides a comprehensive set of pre-built binaries for diverse hardware and operating system targets.
✨ New Features
- Improved MoE prefill throughput on SYCL by changing `k_copy_src1_to_contiguous` to use a precomputed contiguous mapping.
- Switched the MoE routing calculation from O(n_as * n_routed_rows) complexity to a counting sort-based procedure with O(n_as + n_routed_rows) complexity.