b8857

📅 Apr 20, 2026📦 llama-cppView on GitHub →

✨ 3 features🐛 1 fixes🔧 7 symbols

Summary

This release updates the matrix-vector multiplication routines for ggml-webgpu, fixing performance issues with certain quant types and introducing support for new float formats and q4_0.

Migration Steps

Port k-quants to the new matvec implementation if affected by changes.
Remove usage of the old shader if present in custom builds.

✨ New Features

Updated matrix-vector multiplication for ggml-webgpu.
New format float paths are now working.
Working q4_0 support implemented.

🐛 Bug Fixes

Resolved slow performance for q3_k and q5_k quantizations using u32 indexing in ggml-webgpu.

Affected Symbols

ggml-webgpu matrix-vector multiplication q3_k indexing q5_k indexing q4_0 implementation k-quants matvec implementation old shader old constants and format