b9606

📅 Jun 12, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 4 fixes🔧 6 symbols

Summary

This release introduces comprehensive support for EAGLE3 speculative decoding, alongside numerous internal cleanups and parameter renaming across llama and hparams modules. Several platform-specific builds have been temporarily disabled.

Migration Steps

Replace usage of `common_speculative_setup_draft_model()` as it has been removed.
Update parameter usage from `n_embd_target_features` to `n_embd_inp` in hparams.
Remove usage of `target_hidden_size` parameter in hparams.
Rename `output_layer_inp` to `embeddings_layer_inp` in cparams.
Reuse `ATTN_NORM_2` instead of adding a new hidden norm in arch definitions.

✨ New Features

Added support for EAGLE3 speculative decoding.
Enabled layer input extraction for llama models.
Support for eagle3 architecture, including Gemma4 eagle3 from RedHatAI.

🐛 Bug Fixes

Fixed parameters bug in eagle3 implementation.
Fixed ubatch handling in embd_layer_inp extraction and encoder for eagle3.
Fixed multi-seq issue in d2t vocab mapping for eagle3.
Fixed rebase issues and adapted eagle3 to upstream changes.

Affected Symbols

common_speculative_setup_draft_model n_embd_target_features target_hidden_size output_layer_inp embeddings_layer_inp ATTN_NORM_2