b8113
📦 llama-cppView on GitHub →
✨ 4 features🐛 2 fixes🔧 6 symbols
Summary
This release introduces robust support for the Step-3.5-Flash model, including correct XML tool call parsing and thinking support by routing it to the Nemotron v3 PEG parser. Dead thinking code paths in the Qwen3-Coder XML handler were also removed.
Migration Steps
- If relying on Qwen3-Coder XML detection logic that required bare <function> and plural <parameters> markers, note that this detection logic was tightened for Nemotron v3 to prevent Step-3.5-Flash misrouting, though Qwen3-Coder itself should remain unaffected as it lacks <think>.
✨ New Features
- Added support for Step-3.5-Flash format detection and thinking.
- Step-3.5-Flash tool calls are now routed to the Nemotron v3 PEG parser for streaming and schema-aware parameter parsing.
- Added thinking_forced_open support to Qwen3-Coder-XML initialization.
- Added <think>/</think> to preserved tokens.
🐛 Bug Fixes
- Fixed Step-3.5-Flash format detection which previously misrouted it to Hermes 2 Pro because it lacked bare <function> and plural <parameters> markers required by older logic, leading to arguments remaining as JSON strings.
- Fixed Qwen3-Coder-XML format handler lacking thinking support, ensuring reasoning_content is correctly separated from content in API responses when models like Step-3.5-Flash emit <think>.