Change8

b8121

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 2 symbols

Summary

This release significantly improves CUDA graph capture by delaying activation until stability is confirmed, preventing wasted overhead during prompt processing and allowing graphs to re-enable after stabilization. It also includes minor cleanup by removing EM dashes.

Migration Steps

  1. No explicit migration steps required for API users; behavior change is internal to CUDA graph management.

✨ New Features

  • Improved CUDA graph capture logic by delaying activation until warmup completes (requiring two matching calls before capture starts) to avoid overhead on unstable graphs and allow re-enabling after stabilization.

🐛 Bug Fixes

  • Fixed issues related to permanent disabling of CUDA graphs when properties changed frequently, resolving problems like those described in discussion #19708.

Affected Symbols