b9122

📅 May 12, 2026📦 llama-cppView on GitHub →

🐛 8 fixes🔧 6 symbols

Summary

This release focuses on fixing precision issues within ggml-webgpu for multimodal operations, alongside numerous fixes for GELU functions and flash attention paths.

🐛 Bug Fixes

Addressed precision issues for multimodal operations in ggml-webgpu by using f32 for precision and updating shared memory calculation logic.
Corrected the gelu, gelu quick, and gelu erf functions.
Fixed hardcoded v type in flash-attn-tile.
Fixed tile path in flash_attn.
Removed redundant pipeline keys.
Removed inline min/max group size functions and reverted the flash attn path order.
Used clamp to avoid NaN for GELU.
Used the right range (80 is safer) for exp when using f32.

Affected Symbols

ggml-webgpu gelu gelu quick gelu erf flash-attn-tile flash_attn