b8724

📅 Apr 9, 2026📦 llama-cppView on GitHub →

✨ 2 features🐛 3 fixes🔧 2 symbols

Summary

This release enhances SYCL performance by adding Flash Attention support for head size 512 and cleans up backend initialization logic. It also removes defunct mxfp4 reordering logic.

✨ New Features

Added SYCL Flash Attention support for head sizes (DKQ/DV) of 512.
Updated kernel selection logic in SYCL Flash Attention to allow vector kernels for head sizes up to 512 (previously 256).

🐛 Bug Fixes

Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`.
Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization.
Removed defunct mxfp4 reorder from setting buffer type.

Affected Symbols

fattn-tile.hpp ggml_backend_sycl_buffer_init_tensor