Change8

b9591

Breaking Changes
📦 llama-cppView on GitHub →
1 breaking1 features🐛 2 fixes🔧 2 symbols

Summary

This release focuses on internal optimizations by removing padding and multiple D2D copies for MTP, alongside updating the ggml_gated_delta_net interface. It also includes fixes for CI builds.

⚠️ Breaking Changes

  • The ggml_gated_delta_net function signature has changed: it now takes only the initial recurrent state (D, 1, n_seqs) and expects the snapshot count K as an op parameter, instead of inferring K from state->ne[1].

Migration Steps

  1. If you directly call ggml_gated_delta_net, update the call signature to pass the snapshot count K as an op parameter instead of relying on state->ne[1] for inference.

✨ New Features

  • Removed padding hack and implemented copying of all emitted snapshots into the recurrent cache using a single strided ggml_cpy operation.

🐛 Bug Fixes

  • Fixed CI build errors.
  • Made necessary GDN changes across all backends and addressed review comments.

Affected Symbols