b8863
📦 llama-cppView on GitHub →
🐛 2 fixes🔧 1 symbols
Summary
This release includes stability improvements for ggml-cuda by implementing a retry mechanism upon OOM errors and addresses several review comments related to synchronization and cleanup.
🐛 Bug Fixes
- ggml-cuda: Flush legacy pool on OOM and retry to improve stability during memory pressure.
- Address review comments: added explicit sync, updated destructor, and cleaned up MUSA macros in ggml-cuda implementation.