b8882
📦 llama-cppView on GitHub →
✨ 2 features🐛 9 fixes⚡ 1 deprecations🔧 10 symbols
Summary
This release introduces support for conv2d kernels in ggml-webgpu shaders and includes numerous stability fixes related to f16 precision and packed integer handling across various operations. Internal code has been cleaned up, deprecated quant structs removed, and the WebGPU backend instance management improved.
Migration Steps
- Remove error override logic specific to the F16 type if present in custom code.
✨ New Features
- Added support for conv2d kernels in ggml-webgpu shaders.
- Kept one Dawn/WebGPU instance alive for the lifetime of the static backend.
🐛 Bug Fixes
- Fixed busy-polls in Emscripten waitAny after #20618 in ggml(webgpu) and removed the busy webgpu log.
- Fixed GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants.
- Updated Unary wgsl EXP and EXPM1 for f16 stability.
- Fixed GET_ROWS IQ4_XS struct for NaN f16 canonicalization.
- Fixed numerical precision for unary sqrt when working with f16.
- Fixed NaN canonicalization for packed integers using f16.
- Updated error threshold for binary div ops when using f16.
- Fixed accidental removal of the proper initialization of ctx.
- Fixed out of bounds memory access in the weight indexing for shader(conv2d).
Affected Symbols
⚡ Deprecations
- Removed deprecated quant structs.