Change8

b9509

📦 llama-cppView on GitHub →
🐛 1 fixes

Summary

This release optimizes server performance by preventing redundant checkpoint restores when new tokens are available during inference. It also provides numerous pre-built binaries for various operating systems and hardware configurations.

🐛 Bug Fixes

  • Avoided unnecessary checkpoint restore when new tokens are present by conditionally applying the -1 offset in pos_min_thold calculation only when n_past >= task.n_tokens().