Change8

b9851

📦 llama-cppView on GitHub →
🐛 1 fixes🔧 1 symbols

Summary

This release primarily addresses a CUDA-specific bug related to integer handling in the flash attention mask kernel. It also provides updated pre-compiled binaries across numerous platforms.

🐛 Bug Fixes

  • Prevented integer truncation and overflow errors when using KQ mask strides in the flash_attn_mask_to_KV_max kernel on CUDA.

Affected Symbols