b9745
📦 llama-cppView on GitHub →
✨ 6 features🐛 2 fixes🔧 8 symbols
Summary
This release introduces significant support for Step3.5/3.7 flash MTP, adding new APIs and speculative multi-head processing capabilities. Several internal cleanups and bug fixes were also applied.
Migration Steps
- The function 'draft_multi_head' has been merged into 'draft()'.
- The term 'nextn' has been renamed to 'mtp' in relevant contexts.
✨ New Features
- Support for Step3.5/3.7 flash MTP (Multi-Turn Processing) implementation.
- Added mtp_layer_offset and include nextn flags in graph reuse.
- Introduced llama_set_mtp_layer_offset and llama_model_n_nextn_layer API.
- Implemented offset head selection and requirement for all MTP blocks.
- Added speculative multi-head process() and draft() functions.
- Outputs are now gathered via inp_out_ids.
🐛 Bug Fixes
- Fixes applied to core functionality.
- Fix implemented for multi-sequence processing.