b8658
📦 llama-cppView on GitHub →
✨ 3 features🐛 1 fixes🔧 2 symbols
Summary
This release introduces server-side improvements for managing idle KV slots via the new `--clear-idle` flag, optimizing VRAM usage, and includes a fix for Windows CI.
Migration Steps
- If you relied on the previous behavior of clearing idle slot KV upon request release, you may need to adjust logic or use the new `--no-kv-clear-idle` flag if you wish to disable the new default behavior.
✨ New Features
- Introduced server flag `--clear-idle` (enabled by default) to save and clear idle KV slots upon a new task.
- Server now clears idle slots KV from VRAM using LLAMA_KV_KEEP_ONLY_ACTIVE logic.
- The cost associated with saving KV state is now paid by the finishing request.
🐛 Bug Fixes
- Fixed Windows CI by dropping the unlink operation for temporary log files.