b8833

📅 Apr 17, 2026📦 llama-cppView on GitHub →

🐛 3 fixes🔧 3 symbols

Summary

This release primarily addresses compiler warnings, refactors FlashAttention encoding within ggml-webgpu, and includes fixes for soft_max precision and potential segfaults on the Vulkan backend.

Migration Steps

Update workflows to remove dependence on llvmpipe.

🐛 Bug Fixes

Fixed soft_max calculation and updated reg_tile accumulation to f32 for improved precision.
Attempted to avoid segfaults on the Vulkan backend process exit.
Removed compiler warnings related to parameter casting.

Affected Symbols

ggml-webgpu soft_max reg_tile