b9851

📅 Jun 30, 2026📦 llama-cppView on GitHub →

🐛 1 fixes🔧 1 symbols

Summary

This release primarily addresses a CUDA-specific bug related to integer handling in the flash attention mask kernel. It also provides updated pre-compiled binaries across numerous platforms.

🐛 Bug Fixes

Prevented integer truncation and overflow errors when using KQ mask strides in the flash_attn_mask_to_KV_max kernel on CUDA.

Affected Symbols

flash_attn_mask_to_KV_max kernel