b8711
📦 llama-cppView on GitHub →
✨ 3 features🔧 1 symbols
Summary
This release introduces architectural optimizations for the Gemma model by restructuring projection operations within the first layer and before the main layer loop. Pre-compiled binaries are provided for various operating systems and hardware configurations.
✨ New Features
- Gemma model implementation now performs per-layer projections in the first layer for potential performance improvements.
- Reduced graph splits in Gemma by keeping per-layer operations within the input layer.
- Projection logic for Gemma moved before the layer loop in the continuous inference path.