b9606
📦 llama-cppView on GitHub →
✨ 3 features🐛 4 fixes🔧 6 symbols
Summary
This release introduces comprehensive support for EAGLE3 speculative decoding, alongside numerous internal cleanups and parameter renaming across llama and hparams modules. Several platform-specific builds have been temporarily disabled.
Migration Steps
- Replace usage of `common_speculative_setup_draft_model()` as it has been removed.
- Update parameter usage from `n_embd_target_features` to `n_embd_inp` in hparams.
- Remove usage of `target_hidden_size` parameter in hparams.
- Rename `output_layer_inp` to `embeddings_layer_inp` in cparams.
- Reuse `ATTN_NORM_2` instead of adding a new hidden norm in arch definitions.
✨ New Features
- Added support for EAGLE3 speculative decoding.
- Enabled layer input extraction for llama models.
- Support for eagle3 architecture, including Gemma4 eagle3 from RedHatAI.
🐛 Bug Fixes
- Fixed parameters bug in eagle3 implementation.
- Fixed ubatch handling in embd_layer_inp extraction and encoder for eagle3.
- Fixed multi-seq issue in d2t vocab mapping for eagle3.
- Fixed rebase issues and adapted eagle3 to upstream changes.