Change8

b9468

📦 llama-cppView on GitHub →
3 features🐛 3 fixes🔧 11 symbols

Summary

This release introduces real-time reasoning interruption via a new control endpoint, improving user control over ongoing generation. Significant refactoring was done to target control actions by completion ID rather than slot ID to enhance reliability.

Migration Steps

  1. If using the reasoning interruption feature, update control requests to use the completion ID instead of id_slot.
  2. If relying on the skip button behavior during generation, note that it now only shows during the reasoning phase.
  3. The reasoning control action name has been renamed from the previous value to "reasoning_end".

✨ New Features

  • Implemented real-time reasoning interruption via a new control endpoint (POST /v1/chat/completions/control).
  • The control endpoint now accepts { id_slot, action } and arms the budget sampler on demand using the reasoning_control flag.
  • The skip button in the WebUI now correctly keys off the new isReasoning state, showing only during the thinking phase.

🐛 Bug Fixes

  • The control endpoint now targets reasoning control by completion ID instead of slot ID to prevent Time-of-Check to Time-of-Use (TOCTOU) issues.
  • Fixed an issue in the agentic flow where the completion ID was not being relayed, preventing the control request from being sent.
  • The reasoning control model is now read from the streaming message instead of the unrelated model dropdown UI state for consistency.

Affected Symbols