Change8

b9862

📦 llama-cppView on GitHub →
🐛 1 fixes🔧 2 symbols

Summary

This release optimizes performance by removing redundant CUDA memory copies within the gated_delta_net processing path. It also provides updated pre-compiled binaries for numerous platforms and hardware configurations.

🐛 Bug Fixes

  • Removed redundant CUDA copies after gated_delta_net operation by making the kernel write state snapshots directly into the recurrent cache when safe, avoiding intermediate tail writes and copy kernels.

Affected Symbols