b9088

📅 May 9, 2026📦 llama-cppView on GitHub →

✨ 1 features🐛 1 fixes🔧 4 symbols

Summary

This release introduces BF16 support for the SYCL backend's GET_ROWS operation, resolving a performance bottleneck for models utilizing BF16 embeddings. Numerous pre-compiled binaries for diverse platforms are also provided.

✨ New Features

Added BF16 support to the SYCL backend's GET_ROWS operation.

🐛 Bug Fixes

Fixed a performance regression where models using BF16 embedding tensors (like Gemma4's per_layer_token_embd.weight) would fall back to CPU for the GET_ROWS op, causing unnecessary GPU-to-CPU tensor transfers.

Affected Symbols

GET_ROWS operation supports_op kernel dispatch get_rows_sycl_float