b9509
📦 llama-cppView on GitHub →
🐛 1 fixes
Summary
This release optimizes server performance by preventing redundant checkpoint restores when new tokens are available during inference. It also provides numerous pre-built binaries for various operating systems and hardware configurations.
🐛 Bug Fixes
- Avoided unnecessary checkpoint restore when new tokens are present by conditionally applying the -1 offset in pos_min_thold calculation only when n_past >= task.n_tokens().