Change8

b8648

📦 llama-cppView on GitHub →
2 features🐛 1 fixes🔧 3 symbols

Summary

This release introduces hardware acceleration for MoE models via the MUL_MAT_ID op in ggml-zendnn and includes minor consistency fixes in the sgemm failure condition.

Migration Steps

  1. If using MoE models with ZenDNN acceleration, note that the MUL_MAT_ID op will fallback to the CPU backend if the total number of experts exceeds 32.

✨ New Features

  • Added MUL_MAT_ID op acceleration support for Mixture-of-Experts (MoE) models via ggml-zendnn.
  • Updated ZenDNN library pointer to latest bits (ZenDNN-2026-WW13).

🐛 Bug Fixes

  • Added braces to the sgemm failure condition in ggml-zendnn for improved consistency.

Affected Symbols