b9509

📅 Jun 4, 2026📦 llama-cppView on GitHub →

🐛 1 fixes

Summary

This release optimizes server performance by preventing redundant checkpoint restores when new tokens are available during inference. It also provides numerous pre-built binaries for various operating systems and hardware configurations.

🐛 Bug Fixes

Avoided unnecessary checkpoint restore when new tokens are present by conditionally applying the -1 offset in pos_min_thold calculation only when n_past >= task.n_tokens().