Change8

b9122

📦 llama-cppView on GitHub →
🐛 8 fixes🔧 6 symbols

Summary

This release focuses on fixing precision issues within ggml-webgpu for multimodal operations, alongside numerous fixes for GELU functions and flash attention paths.

🐛 Bug Fixes

  • Addressed precision issues for multimodal operations in ggml-webgpu by using f32 for precision and updating shared memory calculation logic.
  • Corrected the gelu, gelu quick, and gelu erf functions.
  • Fixed hardcoded v type in flash-attn-tile.
  • Fixed tile path in flash_attn.
  • Removed redundant pipeline keys.
  • Removed inline min/max group size functions and reverted the flash attn path order.
  • Used clamp to avoid NaN for GELU.
  • Used the right range (80 is safer) for exp when using f32.

Affected Symbols