Change8

b7645

📦 llama-cppView on GitHub →
4 features🐛 2 fixes🔧 1 symbols

Summary

This release focuses on performance tuning for ROCm and RDNA architectures by adjusting mmq kernel switching logic, resolving a recent performance regression.

✨ New Features

  • Added n_experts branch logic similar to the cdna path in mmq.cu.
  • Tuned mmq/rocblas switching for RDNA architectures in mmq.cu.
  • Tuned mmq/wmma switching for RDNA architectures in mmq.cu.
  • Moved amd wmma mmq/wmma switching behind the IS_RDNA3 flag in mmq.cu.

🐛 Bug Fixes

  • Patched performance regression for mmq kernels in ROCm.
  • Recovered performance regression related to issue #17917.

🔧 Affected Symbols

mmq.cu