b8698

📅 Apr 8, 2026📦 llama-cppView on GitHub →

✨ 6 features🐛 2 fixes🔧 1 symbols

Summary

This release focuses heavily on optimizing and stabilizing the ggml-webgpu backend by parameterizing submission sizes, adding iOS limits, and removing internal deadlocks. Internal refactoring includes moving types and simplifying profiling futures.

Migration Steps

Move to unpackf16 for wider compatibility (if applicable to your build process).

✨ New Features

Parameterization of submission size for ggml-webgpu.
Addition of iOS-specific limits for ggml-webgpu submissions.
Work towards removing bitcast in ggml.
Moving existing types over in ggml.
Addition of parameters for different browsers in-flight submissions for ggml-webgpu.
Update handling of batch size for ggml-webgpu.

🐛 Bug Fixes

Fix stride issue.
Removal of deadlock condition in free_bufs for ggml.

Affected Symbols

ggml-webgpu