b7645

📅 Jan 6, 2026📦 llama-cppView on GitHub →

✨ 4 features🐛 2 fixes🔧 1 symbols

Summary

This release focuses on performance tuning for ROCm and RDNA architectures by adjusting mmq kernel switching logic, resolving a recent performance regression.

✨ New Features

Added n_experts branch logic similar to the cdna path in mmq.cu.
Tuned mmq/rocblas switching for RDNA architectures in mmq.cu.
Tuned mmq/wmma switching for RDNA architectures in mmq.cu.
Moved amd wmma mmq/wmma switching behind the IS_RDNA3 flag in mmq.cu.

🐛 Bug Fixes

Patched performance regression for mmq kernels in ROCm.
Recovered performance regression related to issue #17917.

🔧 Affected Symbols

mmq.cu