b9274

📅 May 21, 2026📦 llama-cppView on GitHub →

🐛 2 fixes🔧 3 symbols

Summary

This release fixes a critical VRAM leak occurring during server sleep/resume cycles for Multi-Token Prediction (MTP) models by improving resource cleanup in the destroy function. It also provides numerous pre-compiled binaries for various platforms and hardware configurations.

🐛 Bug Fixes

Fixed a VRAM leak on server sleep/resume cycles for MTP models by ensuring the speculative decoder (spec), draft context (ctx_dft), and draft model (model_dft) are properly freed.
Ensured proper cleanup order in destroy() by resetting spec, ctx_dft, and model_dft before resetting llama_init to prevent potential use-after-free errors.

Affected Symbols

destroy()server_context_impl llama_init.reset()