b8234
📦 llama-cppView on GitHub →
✨ 1 features🐛 2 fixes🔧 1 symbols
Summary
This release introduces SYCL support for Flash Attention across multiple quantization formats and includes minor internal cleanups like warning removal and JIT updates.
✨ New Features
- Added support for Flash Attention on SYCL backend for fp32, fp16, Q4, Q5, and Q8 quantization levels.
🐛 Bug Fixes
- Removed a warning message.
- Updated code for Just-In-Time (JIT) compilation.