Change8

b9776

📦 llama-cppView on GitHub →
🐛 1 fixes🔧 2 symbols

Summary

This release primarily addresses a numerical stability issue in the Vulkan backend by adjusting the order of operations in Fused Attention kernels. It also provides a comprehensive set of pre-built binaries for numerous platforms.

🐛 Bug Fixes

  • Vulkan backend now applies bias before softmax in Fused Attention (FA) kernels to prevent potential overflow issues.

Affected Symbols