b9235
📦 llama-cppView on GitHub →
✨ 2 features🐛 8 fixes⚡ 1 deprecations🔧 12 symbols
Summary
This release focuses on MTP clean-up, primarily affecting speculative decoding implementations by fixing parameter handling, re-enabling certain configurations, and updating documentation. Several deprecated CLI options for speculative decoding were removed.
Migration Steps
- If using speculative decoding CLI, note that spec-draft-ctx-size and spec-draft-replace are removed.
- Review speculative decoding CLI arguments as defaults for n_max, n_min, and p_min have been corrected to match implementation (n_max=3, n_min=0, p_min=0.0).
✨ New Features
- Extended --spec-default configuration to include ngram-map-k4v.
- Added environment variables for new speculative decoding parameters.
🐛 Bug Fixes
- Disabled equal splits for recurrent memory with partial rollback in llama.
- Re-enabled p-min with MTP drafts in spec.
- Re-enabled ngram spec in combination with RS rollback in spec.
- Fixed ngram-map-* parameters in spec.
- Fixed acceptance logic in combined ngram + draft configs in spec.
- Fixed reuse for combined `token` + `embd` batches in graph.
- Relaxed ngram-mod rejection threshold to 0.25 @ 5 low in spec.
- Fixed n_embd log in minor.
Affected Symbols
⚡ Deprecations
- Removed deprecated options from speculative decoding CLI arguments: spec-draft-ctx-size and spec-draft-replace.