b8749

📅 Apr 10, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 9 fixes⚡ 1 deprecations🔧 8 symbols

Summary

This release addresses numerous quantization precision issues within ggml WebGPU, especially concerning f16 stability and NaN handling. It also improves backend lifecycle management for WebGPU and cleans up deprecated code.

Migration Steps

Remove error override logic for F16 type if present in custom code, as it has been removed internally.

✨ New Features

Improved backend lifecycle management for WebGPU by keeping one Dawn/WebGPU instance alive for the lifetime of the static backend.

🐛 Bug Fixes

Fixed busy-polls in Emscripten waitAny for ggml(webgpu) following changes in #20618.
Fixed GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants.
Updated Unary wgsl EXP and EXPM1 for f16 stability.
Fixed GET_ROWS IQ4_XS struct for NaN f16 canonicalization.
Fixed numerical precision for unary sqrt when working with f16.
Fixed NaN canonicalization for packed integers using f16.
Updated error threshold for binary division operations when using f16.
Fixed accidental removal of proper context (ctx) initialization.
Ensured tests operations were not modified.

Affected Symbols

ggml(webgpu)GET_ROWS Unary wgsl EXP Unary wgsl EXPM1 IQ4_XS unary sqrt binary div ops Dawn/WebGPU instance lifecycle

⚡ Deprecations

Removed deprecated quant structs.