b8091
📦 llama-cppView on GitHub →
✨ 3 features🐛 4 fixes🔧 8 symbols
Summary
This release focuses heavily on refactoring and implementing preliminary JIT compilation for key matrix operations within the ggml WebGPU backend, alongside organizing the shader library.
✨ New Features
- Basic JIT compilation implemented for WebGPU operations: mul_mat, get_rows, and scale.
- Began work on an all-encompassing shader library for ggml WebGPU.
- Added support for CUDA 13.1 builds on Windows.
🐛 Bug Fixes
- Fixed issues related to get_rows and workgroup dispatch in mul_mat WebGPU JIT compilation.
- Refactored argmax and set_rows shaders.
- Moved flashattention and matrix multiplication shaders to the new format.
- Removed duplicate constants during shader refactoring.