b8931
📦 llama-cppView on GitHub →
🐛 2 fixes
Summary
This release focuses on performance improvements for CUDA by reducing MMQ stream-k overhead and updating internal integer usage for kbc calculations.
🐛 Bug Fixes
- Reduced MMQ stream-k overhead in CUDA kernels.
- Switched to using 32-bit integers for kbc calculations in CUDA.