b9235

📅 May 20, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 8 fixes⚡ 1 deprecations🔧 12 symbols

Summary

This release focuses on MTP clean-up, primarily affecting speculative decoding implementations by fixing parameter handling, re-enabling certain configurations, and updating documentation. Several deprecated CLI options for speculative decoding were removed.

Migration Steps

If using speculative decoding CLI, note that spec-draft-ctx-size and spec-draft-replace are removed.
Review speculative decoding CLI arguments as defaults for n_max, n_min, and p_min have been corrected to match implementation (n_max=3, n_min=0, p_min=0.0).

✨ New Features

Extended --spec-default configuration to include ngram-map-k4v.
Added environment variables for new speculative decoding parameters.

🐛 Bug Fixes

Disabled equal splits for recurrent memory with partial rollback in llama.
Re-enabled p-min with MTP drafts in spec.
Re-enabled ngram spec in combination with RS rollback in spec.
Fixed ngram-map-* parameters in spec.
Fixed acceptance logic in combined ngram + draft configs in spec.
Fixed reuse for combined `token` + `embd` batches in graph.
Relaxed ngram-mod rejection threshold to 0.25 @ 5 low in spec.
Fixed n_embd log in minor.

Affected Symbols

llama:recurrent_memory spec:p-min spec:ngram_spec spec:ngram-map-*spec:acceptance_logic graph:token_embd_batch_reuse spec:ngram-mod_rejection_thold n_embd_log --spec-default --spec-type spec-draft-ctx-size spec-draft-replace

⚡ Deprecations

Removed deprecated options from speculative decoding CLI arguments: spec-draft-ctx-size and spec-draft-replace.