Change8

b9133

📦 llama-cppView on GitHub →
2 features🔧 2 symbols

Summary

This release introduces support for continuing generation on reasoning models in the server and WebUI by adjusting how prefilled content and reasoning tags are handled. Channel-based templates are explicitly excluded from this new prefill support for now.

Migration Steps

  1. Users relying on channel-based templates (like GPT-OSS) for reasoning prefill should note that this functionality is currently out of scope and pending a per-template prefill API.

✨ New Features

  • Server and WebUI now support continuing generation on reasoning models by orchestrating thinking tags around prefilled messages.
  • WebUI drops the reasoning guard on the Continue button, sends reasoning_content with the prefilled message, and persists partial reasoning on stop to survive reload and resume.

Affected Symbols