b8470

📅 Mar 22, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 2 fixes🔧 1 symbols

Summary

This release introduces native bf16 flash attention for the ggml-cuda vec kernel and addresses several platform-specific build and CI failures across different hardware targets.

Migration Steps

Reverted tile kernel changes to avoid a larger refactor, meaning previous tile kernel modifications are not present in this release.

✨ New Features

Added native bf16 flash attention support for the ggml-cuda vec kernel.

🐛 Bug Fixes

Fixed CI failures observed on Turing and HIP architectures.
Resolved an issue preventing bf16 vec kernel compilation on HIP v_dot2 platforms.

Affected Symbols

ggml-cuda