b8133

Breaking Changes

📅 Feb 23, 2026📦 llama-cppView on GitHub →

⚠ 2 breaking✨ 3 features🐛 1 fixes🔧 5 symbols

Summary

This release removes the storage of output ids, logits, and embeddings from the llama context state, necessitating updates to session handling and state loading mechanisms. Bug fixes include addressing sequence allocation errors in examples for recurrent models.

⚠️ Breaking Changes

Removed write/read operations for output ids, logits, and embeddings from the llama context state. Code relying on these being present in the state must be updated.
Session handling in the completion tool was updated; logits are no longer stored in the session file. Users must now replay the last token after loading a session to regenerate logits for sampling.

Migration Steps

If using session saving/loading, ensure that the last token is replayed after loading the state to regenerate necessary logits.
Update any code that directly reads or writes output ids, logits, or embeddings from the llama context state.

✨ New Features

Added replying of session state in the completion tool to account for removed logit storage.
Added common_prompt_batch_decode function for decoding prompts and optionally handling session data saving.
Updated save-load-state example to use llama_state_load_file and replay the last token after loading.

🐛 Bug Fixes

Fixed an issue in the save-load-state example where setting n_seq_max=1 caused errors when trying to use a second sequence for recurrent/hybrid models by setting n_seq_max = 2 for ctx3 initialization.

Affected Symbols

llama context state completion tool session handling common_prompt_batch_decode llama_state_load_file save-load-state example