b9591
Breaking Changes📦 llama-cppView on GitHub →
⚠ 1 breaking✨ 1 features🐛 2 fixes🔧 2 symbols
Summary
This release focuses on internal optimizations by removing padding and multiple D2D copies for MTP, alongside updating the ggml_gated_delta_net interface. It also includes fixes for CI builds.
⚠️ Breaking Changes
- The ggml_gated_delta_net function signature has changed: it now takes only the initial recurrent state (D, 1, n_seqs) and expects the snapshot count K as an op parameter, instead of inferring K from state->ne[1].
Migration Steps
- If you directly call ggml_gated_delta_net, update the call signature to pass the snapshot count K as an op parameter instead of relying on state->ne[1] for inference.
✨ New Features
- Removed padding hack and implemented copying of all emitted snapshots into the recurrent cache using a single strided ggml_cpy operation.
🐛 Bug Fixes
- Fixed CI build errors.
- Made necessary GDN changes across all backends and addressed review comments.