Change8

b8234

📦 llama-cppView on GitHub →
1 features🐛 2 fixes🔧 1 symbols

Summary

This release introduces SYCL support for Flash Attention across multiple quantization formats and includes minor internal cleanups like warning removal and JIT updates.

✨ New Features

  • Added support for Flash Attention on SYCL backend for fp32, fp16, Q4, Q5, and Q8 quantization levels.

🐛 Bug Fixes

  • Removed a warning message.
  • Updated code for Just-In-Time (JIT) compilation.

Affected Symbols