b9141
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 2 symbols
Summary
This release introduces support for the `continue_final_message` flag in the server and WebUI to align with the vLLM API, ensuring correct behavior when continuing final messages during generation.
✨ New Features
- Server and WebUI now accept the `continue_final_message` body flag for compatibility with the vLLM API.
- When `continue_final_message` is true and `add_generation_prompt` is false, the existing `prefill_assistant` code path is triggered, overriding the server side `opt.prefill_assistant` setting.
🐛 Bug Fixes
- Enforced mutual exclusion between `continue_final_message=true` and `add_generation_prompt=true`, returning HTTP 400 if both are set, matching vLLM behavior.