Change8

b8470

📦 llama-cppView on GitHub →
1 features🐛 2 fixes🔧 1 symbols

Summary

This release introduces native bf16 flash attention for the ggml-cuda vec kernel and addresses several platform-specific build and CI failures across different hardware targets.

Migration Steps

  1. Reverted tile kernel changes to avoid a larger refactor, meaning previous tile kernel modifications are not present in this release.

✨ New Features

  • Added native bf16 flash attention support for the ggml-cuda vec kernel.

🐛 Bug Fixes

  • Fixed CI failures observed on Turing and HIP architectures.
  • Resolved an issue preventing bf16 vec kernel compilation on HIP v_dot2 platforms.

Affected Symbols