b8882

📅 Apr 22, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 9 fixes⚡ 1 deprecations🔧 10 symbols

Summary

This release introduces support for conv2d kernels in ggml-webgpu shaders and includes numerous stability fixes related to f16 precision and packed integer handling across various operations. Internal code has been cleaned up, deprecated quant structs removed, and the WebGPU backend instance management improved.

Migration Steps

Remove error override logic specific to the F16 type if present in custom code.

✨ New Features

Added support for conv2d kernels in ggml-webgpu shaders.
Kept one Dawn/WebGPU instance alive for the lifetime of the static backend.

🐛 Bug Fixes

Fixed busy-polls in Emscripten waitAny after #20618 in ggml(webgpu) and removed the busy webgpu log.
Fixed GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants.
Updated Unary wgsl EXP and EXPM1 for f16 stability.
Fixed GET_ROWS IQ4_XS struct for NaN f16 canonicalization.
Fixed numerical precision for unary sqrt when working with f16.
Fixed NaN canonicalization for packed integers using f16.
Updated error threshold for binary div ops when using f16.
Fixed accidental removal of the proper initialization of ctx.
Fixed out of bounds memory access in the weight indexing for shader(conv2d).

Affected Symbols

ggml-webgpu Emscripten waitAny GET_ROWS Unary wgsl EXP Unary wgsl EXPM1 IQ4_XS unary sqrt binary div ops Dawn/WebGPU instance shader conv2d kernels

⚡ Deprecations

Removed deprecated quant structs.