b9088
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 4 symbols
Summary
This release introduces BF16 support for the SYCL backend's GET_ROWS operation, resolving a performance bottleneck for models utilizing BF16 embeddings. Numerous pre-compiled binaries for diverse platforms are also provided.
✨ New Features
- Added BF16 support to the SYCL backend's GET_ROWS operation.
🐛 Bug Fixes
- Fixed a performance regression where models using BF16 embedding tensors (like Gemma4's per_layer_token_embd.weight) would fall back to CPU for the GET_ROWS op, causing unnecessary GPU-to-CPU tensor transfers.