b8811
📦 llama-cppView on GitHub →
✨ 2 features🐛 6 fixes🔧 2 symbols
Summary
This release introduces significant performance enhancements to the ggml-webgpu backend via compute pass batching and removes profiling overhead. Several fixes were applied to matmul implementations and GPU profiling mechanisms.
Migration Steps
- Removed iOS throttling now that compute passes are batched.
✨ New Features
- Implemented compute pass batching for ggml-webgpu when not profiling.
- Added support for f32 accumulation in register tiling matmul.
🐛 Bug Fixes
- Fixed profiling code.
- Fixed register tiling matmul for Chrome (attributed to dawn issues).
- Updated batch tuning value for iOS.
- Fixed compilation issues.
- Fixed use of new load function.
- Consolidated GPU profiling to use a single query set.