b9866
📦 llama-cppView on GitHub →
✨ 1 features🔧 1 symbols
Summary
This release enables topk-moe fusion for models using 288 experts on CUDA, resulting in decode performance gains, and provides numerous pre-built binaries for various operating systems and hardware configurations.
✨ New Features
- Enabled topk-moe fusion for models with 288 experts on CUDA, improving decode performance by ~2.4% at shallow context.