b8711

📅 Apr 9, 2026📦 llama-cppView on GitHub →

✨ 3 features🔧 1 symbols

Summary

This release introduces architectural optimizations for the Gemma model by restructuring projection operations within the first layer and before the main layer loop. Pre-compiled binaries are provided for various operating systems and hardware configurations.

✨ New Features

Gemma model implementation now performs per-layer projections in the first layer for potential performance improvements.
Reduced graph splits in Gemma by keeping per-layer operations within the input layer.
Projection logic for Gemma moved before the layer loop in the continuous inference path.

Affected Symbols

gemma