b9310
📦 llama-cppView on GitHub →
✨ 2 features🐛 3 fixes🔧 3 symbols
Summary
This release focuses on improving server checkpoint creation reliability, especially for chat and multimodal prompts, and includes various platform-specific binary updates. A new configuration option `--checkpoint-min-step` has been added to manage checkpoint frequency.
Migration Steps
- When creating context checkpoints, note that spans are now extracted from chat templates.
- Be aware that checkpoint creation logic now avoids periodic mid-prompt checkpoints when the prompt token position before the latest user message is known.
- If using multimodal prompts, ensure mapping of text/template positions to server prompt tokens is handled correctly due to related fixes.
✨ New Features
- Added support for autoparser detection for message barriers.
- Introduced `common_chat_split_by_role` in the common module.
🐛 Bug Fixes
- Fixed checkpoint creation in the server component.
- Fixed spans calculation to correctly reach the end of the message.
- Fixed message span delimiter in the server component.