b9464

📅 Jun 1, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 2 fixes🔧 2 symbols

Summary

This release refactors speculative decoding logic by introducing common_speculative_n_max and fixes issues related to n_outputs_max, while disabling the auto-enablement of draft-simple mode.

Migration Steps

If you relied on draft-simple being automatically enabled, you may need to explicitly enable it now.
Review usage of logic previously handled by server_n_outputs_max as it has been moved to common_speculative_n_max.

✨ New Features

Added common_speculative_n_max helper function to centralize speculative max-draft-size logic.
Draft context now always includes n_parallel outputs.

🐛 Bug Fixes

Fixed logic related to n_outputs_max in speculative decoding.
Removed automatic enabling of draft-simple mode.

Affected Symbols

server_n_outputs_max common_speculative_n_max