Change8

b8857

📦 llama-cppView on GitHub →
3 features🐛 1 fixes🔧 7 symbols

Summary

This release updates the matrix-vector multiplication routines for ggml-webgpu, fixing performance issues with certain quant types and introducing support for new float formats and q4_0.

Migration Steps

  1. Port k-quants to the new matvec implementation if affected by changes.
  2. Remove usage of the old shader if present in custom builds.

✨ New Features

  • Updated matrix-vector multiplication for ggml-webgpu.
  • New format float paths are now working.
  • Working q4_0 support implemented.

🐛 Bug Fixes

  • Resolved slow performance for q3_k and q5_k quantizations using u32 indexing in ggml-webgpu.

Affected Symbols