Change8

b9235

📦 llama-cppView on GitHub →
2 features🐛 8 fixes1 deprecations🔧 12 symbols

Summary

This release focuses on MTP clean-up, primarily affecting speculative decoding implementations by fixing parameter handling, re-enabling certain configurations, and updating documentation. Several deprecated CLI options for speculative decoding were removed.

Migration Steps

  1. If using speculative decoding CLI, note that spec-draft-ctx-size and spec-draft-replace are removed.
  2. Review speculative decoding CLI arguments as defaults for n_max, n_min, and p_min have been corrected to match implementation (n_max=3, n_min=0, p_min=0.0).

✨ New Features

  • Extended --spec-default configuration to include ngram-map-k4v.
  • Added environment variables for new speculative decoding parameters.

🐛 Bug Fixes

  • Disabled equal splits for recurrent memory with partial rollback in llama.
  • Re-enabled p-min with MTP drafts in spec.
  • Re-enabled ngram spec in combination with RS rollback in spec.
  • Fixed ngram-map-* parameters in spec.
  • Fixed acceptance logic in combined ngram + draft configs in spec.
  • Fixed reuse for combined `token` + `embd` batches in graph.
  • Relaxed ngram-mod rejection threshold to 0.25 @ 5 low in spec.
  • Fixed n_embd log in minor.

Affected Symbols

⚡ Deprecations

  • Removed deprecated options from speculative decoding CLI arguments: spec-draft-ctx-size and spec-draft-replace.