b8470
📦 llama-cppView on GitHub →
✨ 1 features🐛 2 fixes🔧 1 symbols
Summary
This release introduces native bf16 flash attention for the ggml-cuda vec kernel and addresses several platform-specific build and CI failures across different hardware targets.
Migration Steps
- Reverted tile kernel changes to avoid a larger refactor, meaning previous tile kernel modifications are not present in this release.
✨ New Features
- Added native bf16 flash attention support for the ggml-cuda vec kernel.
🐛 Bug Fixes
- Fixed CI failures observed on Turing and HIP architectures.
- Resolved an issue preventing bf16 vec kernel compilation on HIP v_dot2 platforms.