b9481
📦 llama-cppView on GitHub →
✨ 13 features🐛 1 fixes🔧 13 symbols
Summary
This release introduces support for the new Granite Multilingual R2 embedding models, including tokenizer updates and FFN architecture changes. It also centralizes hidden activation mapping and adds support for the GGUF hidden activation key.
Migration Steps
- modern-bert: explicitly assign LLM_FFN_GEGLU before reading GGUF if necessary, as its state is unchanged.
✨ New Features
- Added support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models.
- Added a version of the gpt4o tokenizer with a fixed regex for better mark handling and different token merging settings for the 97m model.
- Reused gemma4 tokenizer for the 311m model.
- Added support for SwiGLU FFN for Granite Embedding Multilingual R2 models.
- Added new GGUF key <arch>.hidden_activation (LLM_KV_HIDDEN_ACT) + writer.
- Added a forward declaration of llm_ffn_op_type to llama-hparams.h.
- Added llm_ffn_op in hparams.
- Added LLM_FFN_NONE = 0 sentinel to llm_ffn_op_type (value-initialization).
- Centralized hidden_act mapping in llama-model.cpp and added llm_ffn_op_type_from_string() helper.
- modern-bert now reads the GGUF key (when present) and uses the resulting op in its FFN graph.
- Added granite-embedding-{97m,311m}-multilingual-r2 to the converter code.
- Added the hashes for the granite embedding multilingual R2 models.
- Set the hidden_activation in the GGUF if the field is present in config.json (such as for the granite embedding models).
🐛 Bug Fixes
- Fixed regex handling in the gpt4o tokenizer version.
Affected Symbols
ibm-granite/granite-embedding-97m-multilingual-r2ibm-granite/granite-embedding-311m-multilingual-r2gpt4o tokenizergemma4 tokenizerLLM_KV_HIDDEN_ACTllm_ffn_op_typellama-hparams.hllm_ffn_opLLM_FFN_NONELLM_FFN_GEGLUmodern-bertllm_ffn_op_type_from_string()rope_scaling_type/llama_rope_scaling_type_from_string()