b8811

📅 Apr 16, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 6 fixes🔧 2 symbols

Summary

This release introduces significant performance enhancements to the ggml-webgpu backend via compute pass batching and removes profiling overhead. Several fixes were applied to matmul implementations and GPU profiling mechanisms.

Migration Steps

Removed iOS throttling now that compute passes are batched.

✨ New Features

Implemented compute pass batching for ggml-webgpu when not profiling.
Added support for f32 accumulation in register tiling matmul.

🐛 Bug Fixes

Fixed profiling code.
Fixed register tiling matmul for Chrome (attributed to dawn issues).
Updated batch tuning value for iOS.
Fixed compilation issues.
Fixed use of new load function.
Consolidated GPU profiling to use a single query set.

Summary

Migration Steps

✨ New Features

🐛 Bug Fixes

Affected Symbols