b9862
📦 llama-cppView on GitHub →
🐛 1 fixes🔧 2 symbols
Summary
This release optimizes performance by removing redundant CUDA memory copies within the gated_delta_net processing path. It also provides updated pre-compiled binaries for numerous platforms and hardware configurations.
🐛 Bug Fixes
- Removed redundant CUDA copies after gated_delta_net operation by making the kernel write state snapshots directly into the recurrent cache when safe, avoiding intermediate tail writes and copy kernels.