Change8

b8724

📦 llama-cppView on GitHub →
2 features🐛 3 fixes🔧 2 symbols

Summary

This release enhances SYCL performance by adding Flash Attention support for head size 512 and cleans up backend initialization logic. It also removes defunct mxfp4 reordering logic.

✨ New Features

  • Added SYCL Flash Attention support for head sizes (DKQ/DV) of 512.
  • Updated kernel selection logic in SYCL Flash Attention to allow vector kernels for head sizes up to 512 (previously 256).

🐛 Bug Fixes

  • Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`.
  • Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization.
  • Removed defunct mxfp4 reorder from setting buffer type.

Affected Symbols