b9460
📦 llama-cppView on GitHub →
✨ 3 features🔧 3 symbols
Summary
This release introduces limits on the maximum outputs for `llama_context` to conserve VRAM and refactors output configuration by moving `n_outputs_max` to the server context.
Migration Steps
- If using `llama_context`, be aware of the new limit on max outputs.
- Update code to use `n_outputs_per_seq` if applicable.
- Check server context configuration for `n_outputs_max`.
✨ New Features
- Introduced limit on max outputs of `llama_context`.
- Added `n_outputs_per_seq` parameter.
- Moved `n_outputs_max` configuration to server-context.