Change8

b8658

📦 llama-cppView on GitHub →
3 features🐛 1 fixes🔧 2 symbols

Summary

This release introduces server-side improvements for managing idle KV slots via the new `--clear-idle` flag, optimizing VRAM usage, and includes a fix for Windows CI.

Migration Steps

  1. If you relied on the previous behavior of clearing idle slot KV upon request release, you may need to adjust logic or use the new `--no-kv-clear-idle` flag if you wish to disable the new default behavior.

✨ New Features

  • Introduced server flag `--clear-idle` (enabled by default) to save and clear idle KV slots upon a new task.
  • Server now clears idle slots KV from VRAM using LLAMA_KV_KEEP_ONLY_ACTIVE logic.
  • The cost associated with saving KV state is now paid by the finishing request.

🐛 Bug Fixes

  • Fixed Windows CI by dropping the unlink operation for temporary log files.

Affected Symbols