b9468
📦 llama-cppView on GitHub →
✨ 3 features🐛 3 fixes🔧 11 symbols
Summary
This release introduces real-time reasoning interruption via a new control endpoint, improving user control over ongoing generation. Significant refactoring was done to target control actions by completion ID rather than slot ID to enhance reliability.
Migration Steps
- If using the reasoning interruption feature, update control requests to use the completion ID instead of id_slot.
- If relying on the skip button behavior during generation, note that it now only shows during the reasoning phase.
- The reasoning control action name has been renamed from the previous value to "reasoning_end".
✨ New Features
- Implemented real-time reasoning interruption via a new control endpoint (POST /v1/chat/completions/control).
- The control endpoint now accepts { id_slot, action } and arms the budget sampler on demand using the reasoning_control flag.
- The skip button in the WebUI now correctly keys off the new isReasoning state, showing only during the thinking phase.
🐛 Bug Fixes
- The control endpoint now targets reasoning control by completion ID instead of slot ID to prevent Time-of-Check to Time-of-Use (TOCTOU) issues.
- Fixed an issue in the agentic flow where the completion ID was not being relayed, preventing the control request from being sent.
- The reasoning control model is now read from the streaming message instead of the unrelated model dropdown UI state for consistency.