b8091

📅 Feb 18, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 4 fixes🔧 8 symbols

Summary

This release focuses heavily on refactoring and implementing preliminary JIT compilation for key matrix operations within the ggml WebGPU backend, alongside organizing the shader library.

✨ New Features

Basic JIT compilation implemented for WebGPU operations: mul_mat, get_rows, and scale.
Began work on an all-encompassing shader library for ggml WebGPU.
Added support for CUDA 13.1 builds on Windows.

🐛 Bug Fixes

Fixed issues related to get_rows and workgroup dispatch in mul_mat WebGPU JIT compilation.
Refactored argmax and set_rows shaders.
Moved flashattention and matrix multiplication shaders to the new format.
Removed duplicate constants during shader refactoring.

Affected Symbols

ggml webgpu shaders mul_mat get_rows scale argmax set_rows flashattention matrix multiplication