b8648

📅 Apr 3, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 1 fixes🔧 3 symbols

Summary

This release introduces hardware acceleration for MoE models via the MUL_MAT_ID op in ggml-zendnn and includes minor consistency fixes in the sgemm failure condition.

Migration Steps

If using MoE models with ZenDNN acceleration, note that the MUL_MAT_ID op will fallback to the CPU backend if the total number of experts exceeds 32.

✨ New Features

Added MUL_MAT_ID op acceleration support for Mixture-of-Experts (MoE) models via ggml-zendnn.
Updated ZenDNN library pointer to latest bits (ZenDNN-2026-WW13).

🐛 Bug Fixes

Added braces to the sgemm failure condition in ggml-zendnn for improved consistency.

Affected Symbols

ggml-zendnn MUL_MAT_ID op sgemm