Change8

b9464

📦 llama-cppView on GitHub →
2 features🐛 2 fixes🔧 2 symbols

Summary

This release refactors speculative decoding logic by introducing common_speculative_n_max and fixes issues related to n_outputs_max, while disabling the auto-enablement of draft-simple mode.

Migration Steps

  1. If you relied on draft-simple being automatically enabled, you may need to explicitly enable it now.
  2. Review usage of logic previously handled by server_n_outputs_max as it has been moved to common_speculative_n_max.

✨ New Features

  • Added common_speculative_n_max helper function to centralize speculative max-draft-size logic.
  • Draft context now always includes n_parallel outputs.

🐛 Bug Fixes

  • Fixed logic related to n_outputs_max in speculative decoding.
  • Removed automatic enabling of draft-simple mode.

Affected Symbols