Change8

b9291

📦 llama-cppView on GitHub →
2 features🔧 1 symbols

Summary

This release significantly improves MoE prefill throughput on SYCL by optimizing the expert routing calculation complexity. It also provides a comprehensive set of pre-built binaries for diverse hardware and operating system targets.

✨ New Features

  • Improved MoE prefill throughput on SYCL by changing `k_copy_src1_to_contiguous` to use a precomputed contiguous mapping.
  • Switched the MoE routing calculation from O(n_as * n_routed_rows) complexity to a counting sort-based procedure with O(n_as + n_routed_rows) complexity.

Affected Symbols