b9085
📦 llama-cppView on GitHub →
✨ 2 features🐛 2 fixes🔧 2 symbols
Summary
This release introduces specialized flash attention support for MiMo-V2.5 models and includes fixes for GQA handling and dimension carveouts. It also provides numerous pre-built binaries for various operating systems and hardware configurations.
✨ New Features
- Added flash attention MMA/Tiles support for MiMo-V2.5, specifically for d_kq=192 and d_v=128.
- Implemented MiMo-V2.5 flash attention using (256, 256) fattn templates.
🐛 Bug Fixes
- Fixed GQA handling.
- Mirrored 320/576 carveouts for the 192 dimension.