b8607
📦 llama-cppView on GitHub →
✨ 2 features🐛 3 fixes🔧 4 symbols
Summary
This release focuses on improving ggml webgpu support by updating quantized buffers and removing synchronous operations, alongside general cleanup and deadlock fixes.
Migration Steps
- Move to unpackf16 for wider compatibility in ggml operations.
✨ New Features
- ggml webgpu: quantized buffers updated to u32, improving browser/device support.
- Work towards removing bitcast in ggml.
🐛 Bug Fixes
- Added timeout back to wait function.
- Removed synchronous set_tensor/memset_tensor calls.
- Removed deadlock condition in free_bufs.