Change8

b8091

📦 llama-cppView on GitHub →
3 features🐛 4 fixes🔧 8 symbols

Summary

This release focuses heavily on refactoring and implementing preliminary JIT compilation for key matrix operations within the ggml WebGPU backend, alongside organizing the shader library.

✨ New Features

  • Basic JIT compilation implemented for WebGPU operations: mul_mat, get_rows, and scale.
  • Began work on an all-encompassing shader library for ggml WebGPU.
  • Added support for CUDA 13.1 builds on Windows.

🐛 Bug Fixes

  • Fixed issues related to get_rows and workgroup dispatch in mul_mat WebGPU JIT compilation.
  • Refactored argmax and set_rows shaders.
  • Moved flashattention and matrix multiplication shaders to the new format.
  • Removed duplicate constants during shader refactoring.

Affected Symbols