Change8

b9085

📦 llama-cppView on GitHub →
2 features🐛 2 fixes🔧 2 symbols

Summary

This release introduces specialized flash attention support for MiMo-V2.5 models and includes fixes for GQA handling and dimension carveouts. It also provides numerous pre-built binaries for various operating systems and hardware configurations.

✨ New Features

  • Added flash attention MMA/Tiles support for MiMo-V2.5, specifically for d_kq=192 and d_v=128.
  • Implemented MiMo-V2.5 flash attention using (256, 256) fattn templates.

🐛 Bug Fixes

  • Fixed GQA handling.
  • Mirrored 320/576 carveouts for the 192 dimension.

Affected Symbols