b9489
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 1 symbols
Summary
This release introduces an optimization for CUDA by reserving space for the quantized KV-cache during startup and includes a fix related to an assertion in the CUDA implementation.
✨ New Features
- CUDA: Reserve space for quantized KV-cache at startup.
🐛 Bug Fixes
- Removed an assertion in ggml-cuda.cu.