Change8

b8931

📦 llama-cppView on GitHub →
🐛 2 fixes

Summary

This release focuses on performance improvements for CUDA by reducing MMQ stream-k overhead and updating internal integer usage for kbc calculations.

🐛 Bug Fixes

  • Reduced MMQ stream-k overhead in CUDA kernels.
  • Switched to using 32-bit integers for kbc calculations in CUDA.