b8648
📦 llama-cppView on GitHub →
✨ 2 features🐛 1 fixes🔧 3 symbols
Summary
This release introduces hardware acceleration for MoE models via the MUL_MAT_ID op in ggml-zendnn and includes minor consistency fixes in the sgemm failure condition.
Migration Steps
- If using MoE models with ZenDNN acceleration, note that the MUL_MAT_ID op will fallback to the CPU backend if the total number of experts exceeds 32.
✨ New Features
- Added MUL_MAT_ID op acceleration support for Mixture-of-Experts (MoE) models via ggml-zendnn.
- Updated ZenDNN library pointer to latest bits (ZenDNN-2026-WW13).
🐛 Bug Fixes
- Added braces to the sgemm failure condition in ggml-zendnn for improved consistency.