b9745

📅 Jun 21, 2026📦 llama-cppView on GitHub →

✨ 6 features🐛 2 fixes🔧 8 symbols

Summary

This release introduces significant support for Step3.5/3.7 flash MTP, adding new APIs and speculative multi-head processing capabilities. Several internal cleanups and bug fixes were also applied.

Migration Steps

The function 'draft_multi_head' has been merged into 'draft()'.
The term 'nextn' has been renamed to 'mtp' in relevant contexts.

✨ New Features

Support for Step3.5/3.7 flash MTP (Multi-Turn Processing) implementation.
Added mtp_layer_offset and include nextn flags in graph reuse.
Introduced llama_set_mtp_layer_offset and llama_model_n_nextn_layer API.
Implemented offset head selection and requirement for all MTP blocks.
Added speculative multi-head process() and draft() functions.
Outputs are now gathered via inp_out_ids.

🐛 Bug Fixes

Fixes applied to core functionality.
Fix implemented for multi-sequence processing.

Affected Symbols

mtp_layer_offset llama_set_mtp_layer_offset llama_model_n_nextn_layer speculative multi-head process()speculative multi-head draft()inp_out_ids draft_multi_head draft()