b8121

📅 Feb 21, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 1 fixes🔧 2 symbols

Summary

This release significantly improves CUDA graph capture by delaying activation until stability is confirmed, preventing wasted overhead during prompt processing and allowing graphs to re-enable after stabilization. It also includes minor cleanup by removing EM dashes.

Migration Steps

No explicit migration steps required for API users; behavior change is internal to CUDA graph management.

✨ New Features

Improved CUDA graph capture logic by delaying activation until warmup completes (requiring two matching calls before capture starts) to avoid overhead on unstable graphs and allow re-enabling after stabilization.

🐛 Bug Fixes

Fixed issues related to permanent disabling of CUDA graphs when properties changed frequently, resolving problems like those described in discussion #19708.

Affected Symbols

ggml_backend_cuda_graph_compute ggml/src/ggml-cuda/ggml-cuda.cu