b8833
📦 llama-cppView on GitHub →
🐛 3 fixes🔧 3 symbols
Summary
This release primarily addresses compiler warnings, refactors FlashAttention encoding within ggml-webgpu, and includes fixes for soft_max precision and potential segfaults on the Vulkan backend.
Migration Steps
- Update workflows to remove dependence on llvmpipe.
🐛 Bug Fixes
- Fixed soft_max calculation and updated reg_tile accumulation to f32 for improved precision.
- Attempted to avoid segfaults on the Vulkan backend process exit.
- Removed compiler warnings related to parameter casting.