b8183
📦 llama-cppView on GitHub →
🐛 1 fixes🔧 1 symbols
Summary
This release includes a fix for CUDA kernels by capping grid.y at 65535 in non-contiguous dequantize/convert operations and provides updated pre-built binaries for numerous platforms.
🐛 Bug Fixes
- Capped grid.y at 65535 in non-contiguous dequantize/convert kernels on CUDA.