b9851
📦 llama-cppView on GitHub →
🐛 1 fixes🔧 1 symbols
Summary
This release primarily addresses a CUDA-specific bug related to integer handling in the flash attention mask kernel. It also provides updated pre-compiled binaries across numerous platforms.
🐛 Bug Fixes
- Prevented integer truncation and overflow errors when using KQ mask strides in the flash_attn_mask_to_KV_max kernel on CUDA.