Change8

b9274

📦 llama-cppView on GitHub →
🐛 2 fixes🔧 3 symbols

Summary

This release fixes a critical VRAM leak occurring during server sleep/resume cycles for Multi-Token Prediction (MTP) models by improving resource cleanup in the destroy function. It also provides numerous pre-compiled binaries for various platforms and hardware configurations.

🐛 Bug Fixes

  • Fixed a VRAM leak on server sleep/resume cycles for MTP models by ensuring the speculative decoder (spec), draft context (ctx_dft), and draft model (model_dft) are properly freed.
  • Ensured proper cleanup order in destroy() by resetting spec, ctx_dft, and model_dft before resetting llama_init to prevent potential use-after-free errors.

Affected Symbols