b9460

📅 Jun 1, 2026📦 llama-cppView on GitHub →

✨ 3 features🔧 3 symbols

Summary

This release introduces limits on the maximum outputs for `llama_context` to conserve VRAM and refactors output configuration by moving `n_outputs_max` to the server context.

Migration Steps

If using `llama_context`, be aware of the new limit on max outputs.
Update code to use `n_outputs_per_seq` if applicable.
Check server context configuration for `n_outputs_max`.

✨ New Features

Introduced limit on max outputs of `llama_context`.
Added `n_outputs_per_seq` parameter.
Moved `n_outputs_max` configuration to server-context.

Affected Symbols

`llama_context``n_outputs_per_seq``n_outputs_max`