Change8

b9310

📦 llama-cppView on GitHub →
2 features🐛 3 fixes🔧 3 symbols

Summary

This release focuses on improving server checkpoint creation reliability, especially for chat and multimodal prompts, and includes various platform-specific binary updates. A new configuration option `--checkpoint-min-step` has been added to manage checkpoint frequency.

Migration Steps

  1. When creating context checkpoints, note that spans are now extracted from chat templates.
  2. Be aware that checkpoint creation logic now avoids periodic mid-prompt checkpoints when the prompt token position before the latest user message is known.
  3. If using multimodal prompts, ensure mapping of text/template positions to server prompt tokens is handled correctly due to related fixes.

✨ New Features

  • Added support for autoparser detection for message barriers.
  • Introduced `common_chat_split_by_role` in the common module.

🐛 Bug Fixes

  • Fixed checkpoint creation in the server component.
  • Fixed spans calculation to correctly reach the end of the message.
  • Fixed message span delimiter in the server component.

Affected Symbols