Change8

b8711

📦 llama-cppView on GitHub →
3 features🔧 1 symbols

Summary

This release introduces architectural optimizations for the Gemma model by restructuring projection operations within the first layer and before the main layer loop. Pre-compiled binaries are provided for various operating systems and hardware configurations.

✨ New Features

  • Gemma model implementation now performs per-layer projections in the first layer for potential performance improvements.
  • Reduced graph splits in Gemma by keeping per-layer operations within the input layer.
  • Projection logic for Gemma moved before the layer loop in the continuous inference path.

Affected Symbols