b9776
📦 llama-cppView on GitHub →
🐛 1 fixes🔧 2 symbols
Summary
This release primarily addresses a numerical stability issue in the Vulkan backend by adjusting the order of operations in Fused Attention kernels. It also provides a comprehensive set of pre-built binaries for numerous platforms.
🐛 Bug Fixes
- Vulkan backend now applies bias before softmax in Fused Attention (FA) kernels to prevent potential overflow issues.