Change8

b8113

📦 llama-cppView on GitHub →
4 features🐛 2 fixes🔧 6 symbols

Summary

This release introduces robust support for the Step-3.5-Flash model, including correct XML tool call parsing and thinking support by routing it to the Nemotron v3 PEG parser. Dead thinking code paths in the Qwen3-Coder XML handler were also removed.

Migration Steps

  1. If relying on Qwen3-Coder XML detection logic that required bare <function> and plural <parameters> markers, note that this detection logic was tightened for Nemotron v3 to prevent Step-3.5-Flash misrouting, though Qwen3-Coder itself should remain unaffected as it lacks <think>.

✨ New Features

  • Added support for Step-3.5-Flash format detection and thinking.
  • Step-3.5-Flash tool calls are now routed to the Nemotron v3 PEG parser for streaming and schema-aware parameter parsing.
  • Added thinking_forced_open support to Qwen3-Coder-XML initialization.
  • Added <think>/</think> to preserved tokens.

🐛 Bug Fixes

  • Fixed Step-3.5-Flash format detection which previously misrouted it to Hermes 2 Pro because it lacked bare <function> and plural <parameters> markers required by older logic, leading to arguments remaining as JSON strings.
  • Fixed Qwen3-Coder-XML format handler lacking thinking support, ensuring reasoning_content is correctly separated from content in API responses when models like Step-3.5-Flash emit <think>.

Affected Symbols