b8891

📅 Apr 22, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 2 fixes🔧 1 symbols

Summary

This release focuses on performance improvements for the ggml-webgpu backend by introducing fused RMS_NORM + MUL operations and includes various platform-specific binary updates.

Migration Steps

If you were relying on specific WebGPU fusion behavior, note that the fusion logic has been updated and the explicit disable flag was removed; fusion is now managed differently (check source for details on eps handling fix).

✨ New Features

Added fused RMS_NORM + MUL operation for ggml-webgpu backend.
Introduced GGML_WEBGPU_DISABLE_FUSION flag to allow disabling kernel fusion in WebGPU builds.

🐛 Bug Fixes

Fixed epsilon handling in WebGPU fused operations.
Removed reliance on C++20 initializers in WebGPU implementation.

Affected Symbols

ggml-webgpu