b9481

📅 Jun 2, 2026📦 llama-cppView on GitHub →

✨ 13 features🐛 1 fixes🔧 13 symbols

Summary

This release introduces support for the new Granite Multilingual R2 embedding models, including tokenizer updates and FFN architecture changes. It also centralizes hidden activation mapping and adds support for the GGUF hidden activation key.

Migration Steps

modern-bert: explicitly assign LLM_FFN_GEGLU before reading GGUF if necessary, as its state is unchanged.

✨ New Features

Added support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models.
Added a version of the gpt4o tokenizer with a fixed regex for better mark handling and different token merging settings for the 97m model.
Reused gemma4 tokenizer for the 311m model.
Added support for SwiGLU FFN for Granite Embedding Multilingual R2 models.
Added new GGUF key <arch>.hidden_activation (LLM_KV_HIDDEN_ACT) + writer.
Added a forward declaration of llm_ffn_op_type to llama-hparams.h.
Added llm_ffn_op in hparams.
Added LLM_FFN_NONE = 0 sentinel to llm_ffn_op_type (value-initialization).
Centralized hidden_act mapping in llama-model.cpp and added llm_ffn_op_type_from_string() helper.
modern-bert now reads the GGUF key (when present) and uses the resulting op in its FFN graph.
Added granite-embedding-{97m,311m}-multilingual-r2 to the converter code.
Added the hashes for the granite embedding multilingual R2 models.
Set the hidden_activation in the GGUF if the field is present in config.json (such as for the granite embedding models).

🐛 Bug Fixes

Fixed regex handling in the gpt4o tokenizer version.

Affected Symbols

ibm-granite/granite-embedding-97m-multilingual-r2 ibm-granite/granite-embedding-311m-multilingual-r2 gpt4o tokenizer gemma4 tokenizer LLM_KV_HIDDEN_ACT llm_ffn_op_type llama-hparams.h llm_ffn_op LLM_FFN_NONE LLM_FFN_GEGLU modern-bert llm_ffn_op_type_from_string()rope_scaling_type/llama_rope_scaling_type_from_string()