Change8

b8607

📦 llama-cppView on GitHub →
2 features🐛 3 fixes🔧 4 symbols

Summary

This release focuses on improving ggml webgpu support by updating quantized buffers and removing synchronous operations, alongside general cleanup and deadlock fixes.

Migration Steps

  1. Move to unpackf16 for wider compatibility in ggml operations.

✨ New Features

  • ggml webgpu: quantized buffers updated to u32, improving browser/device support.
  • Work towards removing bitcast in ggml.

🐛 Bug Fixes

  • Added timeout back to wait function.
  • Removed synchronous set_tensor/memset_tensor calls.
  • Removed deadlock condition in free_bufs.

Affected Symbols