b8891
📦 llama-cppView on GitHub →
✨ 2 features🐛 2 fixes🔧 1 symbols
Summary
This release focuses on performance improvements for the ggml-webgpu backend by introducing fused RMS_NORM + MUL operations and includes various platform-specific binary updates.
Migration Steps
- If you were relying on specific WebGPU fusion behavior, note that the fusion logic has been updated and the explicit disable flag was removed; fusion is now managed differently (check source for details on eps handling fix).
✨ New Features
- Added fused RMS_NORM + MUL operation for ggml-webgpu backend.
- Introduced GGML_WEBGPU_DISABLE_FUSION flag to allow disabling kernel fusion in WebGPU builds.
🐛 Bug Fixes
- Fixed epsilon handling in WebGPU fused operations.
- Removed reliance on C++20 initializers in WebGPU implementation.