b9866

📅 Jul 3, 2026📦 llama-cppView on GitHub →

✨ 1 features🔧 1 symbols

Summary

This release enables topk-moe fusion for models using 288 experts on CUDA, resulting in decode performance gains, and provides numerous pre-built binaries for various operating systems and hardware configurations.

✨ New Features

Enabled topk-moe fusion for models with 288 experts on CUDA, improving decode performance by ~2.4% at shallow context.

Affected Symbols

topk-moe fusion kernel