b9776

📅 Jun 24, 2026📦 llama-cppView on GitHub →

🐛 1 fixes🔧 2 symbols

Summary

This release primarily addresses a numerical stability issue in the Vulkan backend by adjusting the order of operations in Fused Attention kernels. It also provides a comprehensive set of pre-built binaries for numerous platforms.

🐛 Bug Fixes

Vulkan backend now applies bias before softmax in Fused Attention (FA) kernels to prevent potential overflow issues.

Affected Symbols

vulkan FA (Fused Attention)