Change8

b9489

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 1 symbols

Summary

This release introduces an optimization for CUDA by reserving space for the quantized KV-cache during startup and includes a fix related to an assertion in the CUDA implementation.

✨ New Features

  • CUDA: Reserve space for quantized KV-cache at startup.

🐛 Bug Fixes

  • Removed an assertion in ggml-cuda.cu.

Affected Symbols