b8113

📅 Feb 20, 2026📦 llama-cppView on GitHub →

✨ 4 features🐛 2 fixes🔧 6 symbols

Summary

This release introduces robust support for the Step-3.5-Flash model, including correct XML tool call parsing and thinking support by routing it to the Nemotron v3 PEG parser. Dead thinking code paths in the Qwen3-Coder XML handler were also removed.

Migration Steps

If relying on Qwen3-Coder XML detection logic that required bare <function> and plural <parameters> markers, note that this detection logic was tightened for Nemotron v3 to prevent Step-3.5-Flash misrouting, though Qwen3-Coder itself should remain unaffected as it lacks <think>.

✨ New Features

Added support for Step-3.5-Flash format detection and thinking.
Step-3.5-Flash tool calls are now routed to the Nemotron v3 PEG parser for streaming and schema-aware parameter parsing.
Added thinking_forced_open support to Qwen3-Coder-XML initialization.
Added <think>/</think> to preserved tokens.

🐛 Bug Fixes

Fixed Step-3.5-Flash format detection which previously misrouted it to Hermes 2 Pro because it lacked bare <function> and plural <parameters> markers required by older logic, leading to arguments remaining as JSON strings.
Fixed Qwen3-Coder-XML format handler lacking thinking support, ensuring reasoning_content is correctly separated from content in API responses when models like Step-3.5-Flash emit <think>.

Affected Symbols

Qwen3-Coder XML detection logic Hermes 2 Pro routing logic Qwen3-Coder-XML init function build_grammar_xml_tool_call Nemotron v3 PEG parser qwen3_coder_xml thinking handling code