b9055

📅 May 7, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 9 fixes🔧 8 symbols

Summary

This release introduces support for the Mimo v2.5 model, accompanied by numerous fixes related to tensor manipulation, scaling, and GGUF conversion for this new architecture.

✨ New Features

Added support for the Mimo v2.5 model.

🐛 Bug Fixes

Fixed modify_tensors row split issue for mimo-v2.5.
Added missing add_attn_value_scale plumbing for mimo-v2.5.
Fixed TP dequant to correctly detect TP rows for mimo-v2.5.
Fixed TP iteration order to be descending for mimo-v2.5.
Retained fused qkv for mimo-v2.5.
Fixed missed attn_value scale during merge for mimo-v2.5.
Ensured fused QKV is contiguous for scaling attention value in mimo-v2.5.
Moved speech_embeddings. to TextModel filter_tensors for mimo-v2.5.
Included MTP weights in gguf conversion for mimo-v2.5.

Affected Symbols

mimo-v2.5 modify_tensors add_attn_value_scale tp dequant TextModel filter_tensors src/llama-hparams.h src/models/mimo2.cpp convert_hf_to_gguf.py