b8857
📦 llama-cppView on GitHub →
✨ 3 features🐛 1 fixes🔧 7 symbols
Summary
This release updates the matrix-vector multiplication routines for ggml-webgpu, fixing performance issues with certain quant types and introducing support for new float formats and q4_0.
Migration Steps
- Port k-quants to the new matvec implementation if affected by changes.
- Remove usage of the old shader if present in custom builds.
✨ New Features
- Updated matrix-vector multiplication for ggml-webgpu.
- New format float paths are now working.
- Working q4_0 support implemented.
🐛 Bug Fixes
- Resolved slow performance for q3_k and q5_k quantizations using u32 indexing in ggml-webgpu.