Change8

b8749

📦 llama-cppView on GitHub →
1 features🐛 9 fixes1 deprecations🔧 8 symbols

Summary

This release addresses numerous quantization precision issues within ggml WebGPU, especially concerning f16 stability and NaN handling. It also improves backend lifecycle management for WebGPU and cleans up deprecated code.

Migration Steps

  1. Remove error override logic for F16 type if present in custom code, as it has been removed internally.

✨ New Features

  • Improved backend lifecycle management for WebGPU by keeping one Dawn/WebGPU instance alive for the lifetime of the static backend.

🐛 Bug Fixes

  • Fixed busy-polls in Emscripten waitAny for ggml(webgpu) following changes in #20618.
  • Fixed GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants.
  • Updated Unary wgsl EXP and EXPM1 for f16 stability.
  • Fixed GET_ROWS IQ4_XS struct for NaN f16 canonicalization.
  • Fixed numerical precision for unary sqrt when working with f16.
  • Fixed NaN canonicalization for packed integers using f16.
  • Updated error threshold for binary division operations when using f16.
  • Fixed accidental removal of proper context (ctx) initialization.
  • Ensured tests operations were not modified.

Affected Symbols

⚡ Deprecations

  • Removed deprecated quant structs.