Change8

b9460

📦 llama-cppView on GitHub →
3 features🔧 3 symbols

Summary

This release introduces limits on the maximum outputs for `llama_context` to conserve VRAM and refactors output configuration by moving `n_outputs_max` to the server context.

Migration Steps

  1. If using `llama_context`, be aware of the new limit on max outputs.
  2. Update code to use `n_outputs_per_seq` if applicable.
  3. Check server context configuration for `n_outputs_max`.

✨ New Features

  • Introduced limit on max outputs of `llama_context`.
  • Added `n_outputs_per_seq` parameter.
  • Moved `n_outputs_max` configuration to server-context.

Affected Symbols