Change8

b9088

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 4 symbols

Summary

This release introduces BF16 support for the SYCL backend's GET_ROWS operation, resolving a performance bottleneck for models utilizing BF16 embeddings. Numerous pre-compiled binaries for diverse platforms are also provided.

✨ New Features

  • Added BF16 support to the SYCL backend's GET_ROWS operation.

🐛 Bug Fixes

  • Fixed a performance regression where models using BF16 embedding tensors (like Gemma4's per_layer_token_embd.weight) would fall back to CPU for the GET_ROWS op, causing unnecessary GPU-to-CPU tensor transfers.

Affected Symbols