b8133
Breaking Changes📦 llama-cppView on GitHub →
⚠ 2 breaking✨ 3 features🐛 1 fixes🔧 5 symbols
Summary
This release removes the storage of output ids, logits, and embeddings from the llama context state, necessitating updates to session handling and state loading mechanisms. Bug fixes include addressing sequence allocation errors in examples for recurrent models.
⚠️ Breaking Changes
- Removed write/read operations for output ids, logits, and embeddings from the llama context state. Code relying on these being present in the state must be updated.
- Session handling in the completion tool was updated; logits are no longer stored in the session file. Users must now replay the last token after loading a session to regenerate logits for sampling.
Migration Steps
- If using session saving/loading, ensure that the last token is replayed after loading the state to regenerate necessary logits.
- Update any code that directly reads or writes output ids, logits, or embeddings from the llama context state.
✨ New Features
- Added replying of session state in the completion tool to account for removed logit storage.
- Added common_prompt_batch_decode function for decoding prompts and optionally handling session data saving.
- Updated save-load-state example to use llama_state_load_file and replay the last token after loading.
🐛 Bug Fixes
- Fixed an issue in the save-load-state example where setting n_seq_max=1 caused errors when trying to use a second sequence for recurrent/hybrid models by setting n_seq_max = 2 for ctx3 initialization.