b7645
📦 llama-cppView on GitHub →
✨ 4 features🐛 2 fixes🔧 1 symbols
Summary
This release focuses on performance tuning for ROCm and RDNA architectures by adjusting mmq kernel switching logic, resolving a recent performance regression.
✨ New Features
- Added n_experts branch logic similar to the cdna path in mmq.cu.
- Tuned mmq/rocblas switching for RDNA architectures in mmq.cu.
- Tuned mmq/wmma switching for RDNA architectures in mmq.cu.
- Moved amd wmma mmq/wmma switching behind the IS_RDNA3 flag in mmq.cu.
🐛 Bug Fixes
- Patched performance regression for mmq kernels in ROCm.
- Recovered performance regression related to issue #17917.
🔧 Affected Symbols
mmq.cu