b9133
📦 llama-cppView on GitHub →
✨ 2 features🔧 2 symbols
Summary
This release introduces support for continuing generation on reasoning models in the server and WebUI by adjusting how prefilled content and reasoning tags are handled. Channel-based templates are explicitly excluded from this new prefill support for now.
Migration Steps
- Users relying on channel-based templates (like GPT-OSS) for reasoning prefill should note that this functionality is currently out of scope and pending a per-template prefill API.
✨ New Features
- Server and WebUI now support continuing generation on reasoning models by orchestrating thinking tags around prefilled messages.
- WebUI drops the reasoning guard on the Continue button, sends reasoning_content with the prefilled message, and persists partial reasoning on stop to survive reload and resume.