b9626

📅 Jun 13, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 4 fixes🔧 16 symbols

Summary

This release introduces architecture support for cohere2-MoE and includes several internal cleanups and fixes related to model loading, MTP, and pattern handling. Several naming conventions were updated, including renaming cohere2-moe tokenizer type.

Migration Steps

If using the old tokenizer type for cohere2-moe, note that it has been removed and replaced by tiny_aya. North-Mini-Code-1.0 has been renamed.

✨ New Features

Added architecture support for cohere2-MoE.
Added support for Command models to use LayerNorm by checking for zerobios tensors.
Added cohere2moe to Llama Model Saver supported list.

🐛 Bug Fixes

Fixed sliding_window_pattern issue and pattern.
Fixed transformers crash related to 'first_k_dense_replace' error.
Fixed MTP fail by changing to use iSWA.
Fixed remaining todos related to cohere2moe renaming and SWA parsing.

Affected Symbols

cohere2-MoE ffn lookup lmhead modify tensors token_embd.weight sliding_window_pattern first_k_dense_replace MTP swa parsing get_key_or_arr Llama Model Saver zerobios tensors LayerNorm expert_selection_fn base.py command.py