b9055
📦 llama-cppView on GitHub →
✨ 1 features🐛 9 fixes🔧 8 symbols
Summary
This release introduces support for the Mimo v2.5 model, accompanied by numerous fixes related to tensor manipulation, scaling, and GGUF conversion for this new architecture.
✨ New Features
- Added support for the Mimo v2.5 model.
🐛 Bug Fixes
- Fixed modify_tensors row split issue for mimo-v2.5.
- Added missing add_attn_value_scale plumbing for mimo-v2.5.
- Fixed TP dequant to correctly detect TP rows for mimo-v2.5.
- Fixed TP iteration order to be descending for mimo-v2.5.
- Retained fused qkv for mimo-v2.5.
- Fixed missed attn_value scale during merge for mimo-v2.5.
- Ensured fused QKV is contiguous for scaling attention value in mimo-v2.5.
- Moved speech_embeddings. to TextModel filter_tensors for mimo-v2.5.
- Included MTP weights in gguf conversion for mimo-v2.5.