b8121
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 2 symbols
Summary
This release significantly improves CUDA graph capture by delaying activation until stability is confirmed, preventing wasted overhead during prompt processing and allowing graphs to re-enable after stabilization. It also includes minor cleanup by removing EM dashes.
Migration Steps
- No explicit migration steps required for API users; behavior change is internal to CUDA graph management.
✨ New Features
- Improved CUDA graph capture logic by delaying activation until warmup completes (requiring two matching calls before capture starts) to avoid overhead on unstable graphs and allow re-enabling after stabilization.
🐛 Bug Fixes
- Fixed issues related to permanent disabling of CUDA graphs when properties changed frequently, resolving problems like those described in discussion #19708.