Change8

b9606

📦 llama-cppView on GitHub →
3 features🐛 4 fixes🔧 6 symbols

Summary

This release introduces comprehensive support for EAGLE3 speculative decoding, alongside numerous internal cleanups and parameter renaming across llama and hparams modules. Several platform-specific builds have been temporarily disabled.

Migration Steps

  1. Replace usage of `common_speculative_setup_draft_model()` as it has been removed.
  2. Update parameter usage from `n_embd_target_features` to `n_embd_inp` in hparams.
  3. Remove usage of `target_hidden_size` parameter in hparams.
  4. Rename `output_layer_inp` to `embeddings_layer_inp` in cparams.
  5. Reuse `ATTN_NORM_2` instead of adding a new hidden norm in arch definitions.

✨ New Features

  • Added support for EAGLE3 speculative decoding.
  • Enabled layer input extraction for llama models.
  • Support for eagle3 architecture, including Gemma4 eagle3 from RedHatAI.

🐛 Bug Fixes

  • Fixed parameters bug in eagle3 implementation.
  • Fixed ubatch handling in embd_layer_inp extraction and encoder for eagle3.
  • Fixed multi-seq issue in d2t vocab mapping for eagle3.
  • Fixed rebase issues and adapted eagle3 to upstream changes.

Affected Symbols