b8724
📦 llama-cppView on GitHub →
✨ 2 features🐛 3 fixes🔧 2 symbols
Summary
This release enhances SYCL performance by adding Flash Attention support for head size 512 and cleans up backend initialization logic. It also removes defunct mxfp4 reordering logic.
✨ New Features
- Added SYCL Flash Attention support for head sizes (DKQ/DV) of 512.
- Updated kernel selection logic in SYCL Flash Attention to allow vector kernels for head sizes up to 512 (previously 256).
🐛 Bug Fixes
- Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`.
- Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization.
- Removed defunct mxfp4 reorder from setting buffer type.