b9141

📅 May 14, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 2 symbols

Summary

This release introduces support for the `continue_final_message` flag in the server and WebUI to align with the vLLM API, ensuring correct behavior when continuing final messages during generation.

✨ New Features

Server and WebUI now accept the `continue_final_message` body flag for compatibility with the vLLM API.
When `continue_final_message` is true and `add_generation_prompt` is false, the existing `prefill_assistant` code path is triggered, overriding the server side `opt.prefill_assistant` setting.

🐛 Bug Fixes

Enforced mutual exclusion between `continue_final_message=true` and `add_generation_prompt=true`, returning HTTP 400 if both are set, matching vLLM behavior.

Affected Symbols

server webui