b9133

📅 May 13, 2026📦 llama-cppView on GitHub →

✨ 2 features🔧 2 symbols

Summary

This release introduces support for continuing generation on reasoning models in the server and WebUI by adjusting how prefilled content and reasoning tags are handled. Channel-based templates are explicitly excluded from this new prefill support for now.

Migration Steps

Users relying on channel-based templates (like GPT-OSS) for reasoning prefill should note that this functionality is currently out of scope and pending a per-template prefill API.

✨ New Features

Server and WebUI now support continuing generation on reasoning models by orchestrating thinking tags around prefilled messages.
WebUI drops the reasoning guard on the Continue button, sends reasoning_content with the prefilled message, and persists partial reasoning on stop to survive reload and resume.

Affected Symbols

server webui