b9122
📦 llama-cppView on GitHub →
🐛 8 fixes🔧 6 symbols
Summary
This release focuses on fixing precision issues within ggml-webgpu for multimodal operations, alongside numerous fixes for GELU functions and flash attention paths.
🐛 Bug Fixes
- Addressed precision issues for multimodal operations in ggml-webgpu by using f32 for precision and updating shared memory calculation logic.
- Corrected the gelu, gelu quick, and gelu erf functions.
- Fixed hardcoded v type in flash-attn-tile.
- Fixed tile path in flash_attn.
- Removed redundant pipeline keys.
- Removed inline min/max group size functions and reverted the flash attn path order.
- Used clamp to avoid NaN for GELU.
- Used the right range (80 is safer) for exp when using f32.