Change8

b8811

📦 llama-cppView on GitHub →
2 features🐛 6 fixes🔧 2 symbols

Summary

This release introduces significant performance enhancements to the ggml-webgpu backend via compute pass batching and removes profiling overhead. Several fixes were applied to matmul implementations and GPU profiling mechanisms.

Migration Steps

  1. Removed iOS throttling now that compute passes are batched.

✨ New Features

  • Implemented compute pass batching for ggml-webgpu when not profiling.
  • Added support for f32 accumulation in register tiling matmul.

🐛 Bug Fixes

  • Fixed profiling code.
  • Fixed register tiling matmul for Chrome (attributed to dawn issues).
  • Updated batch tuning value for iOS.
  • Fixed compilation issues.
  • Fixed use of new load function.
  • Consolidated GPU profiling to use a single query set.

Affected Symbols