Change8

b8891

📦 llama-cppView on GitHub →
2 features🐛 2 fixes🔧 1 symbols

Summary

This release focuses on performance improvements for the ggml-webgpu backend by introducing fused RMS_NORM + MUL operations and includes various platform-specific binary updates.

Migration Steps

  1. If you were relying on specific WebGPU fusion behavior, note that the fusion logic has been updated and the explicit disable flag was removed; fusion is now managed differently (check source for details on eps handling fix).

✨ New Features

  • Added fused RMS_NORM + MUL operation for ggml-webgpu backend.
  • Introduced GGML_WEBGPU_DISABLE_FUSION flag to allow disabling kernel fusion in WebGPU builds.

🐛 Bug Fixes

  • Fixed epsilon handling in WebGPU fused operations.
  • Removed reliance on C++20 initializers in WebGPU implementation.

Affected Symbols