Change8

b8766

📦 llama-cppView on GitHub →
6 features🐛 4 fixes🔧 3 symbols

Summary

This release introduces support for the Gemma 4 audio conformer encoder, detailing its specific architecture and preprocessing steps. Several internal fixes were implemented related to tensor loading and mask matching.

✨ New Features

  • Added support for Gemma 4 audio conformer encoder via an USM-style Conformer.
  • Implemented 12-layer Conformer architecture for Gemma 4 audio processing.
  • Added Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm.
  • Implemented full self-attention with sinusoidal RPE and sliding window mask (24 positions).
  • Added logit softcapping at 50.0 and ClippableLinear clamping.
  • Introduced dedicated mel preprocessing via mtmd_audio_preprocessor_gemma4a using HTK mel scale, 128 bins, magnitude STFT, and mel_floor=1e-3.

🐛 Bug Fixes

  • Fixed Tensor loading dedup by using std::set guard to prevent get_tensor() from creating duplicate entries in ctx_data.
  • Moved ClippableLinear clamp_info loading to occur after per-layer tensors.
  • Matched sliding window mask (24 positions) to PyTorch context_size.
  • Skipped Whisper normalization for Gemma4 mel output.

Affected Symbols