Change8

b9866

📦 llama-cppView on GitHub →
1 features🔧 1 symbols

Summary

This release enables topk-moe fusion for models using 288 experts on CUDA, resulting in decode performance gains, and provides numerous pre-built binaries for various operating systems and hardware configurations.

✨ New Features

  • Enabled topk-moe fusion for models with 288 experts on CUDA, improving decode performance by ~2.4% at shallow context.

Affected Symbols