b8698
📦 llama-cppView on GitHub →
✨ 6 features🐛 2 fixes🔧 1 symbols
Summary
This release focuses heavily on optimizing and stabilizing the ggml-webgpu backend by parameterizing submission sizes, adding iOS limits, and removing internal deadlocks. Internal refactoring includes moving types and simplifying profiling futures.
Migration Steps
- Move to unpackf16 for wider compatibility (if applicable to your build process).
✨ New Features
- Parameterization of submission size for ggml-webgpu.
- Addition of iOS-specific limits for ggml-webgpu submissions.
- Work towards removing bitcast in ggml.
- Moving existing types over in ggml.
- Addition of parameters for different browsers in-flight submissions for ggml-webgpu.
- Update handling of batch size for ggml-webgpu.
🐛 Bug Fixes
- Fix stride issue.
- Removal of deadlock condition in free_bufs for ggml.