b9489

📅 Jun 3, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 1 fixes🔧 1 symbols

Summary

This release introduces an optimization for CUDA by reserving space for the quantized KV-cache during startup and includes a fix related to an assertion in the CUDA implementation.

✨ New Features

CUDA: Reserve space for quantized KV-cache at startup.

🐛 Bug Fixes

Removed an assertion in ggml-cuda.cu.

Affected Symbols

ggml-cuda.cu