b8766

📅 Apr 12, 2026📦 llama-cppView on GitHub →

✨ 6 features🐛 4 fixes🔧 3 symbols

Summary

This release introduces support for the Gemma 4 audio conformer encoder, detailing its specific architecture and preprocessing steps. Several internal fixes were implemented related to tensor loading and mask matching.

✨ New Features

Added support for Gemma 4 audio conformer encoder via an USM-style Conformer.
Implemented 12-layer Conformer architecture for Gemma 4 audio processing.
Added Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm.
Implemented full self-attention with sinusoidal RPE and sliding window mask (24 positions).
Added logit softcapping at 50.0 and ClippableLinear clamping.
Introduced dedicated mel preprocessing via mtmd_audio_preprocessor_gemma4a using HTK mel scale, 128 bins, magnitude STFT, and mel_floor=1e-3.

🐛 Bug Fixes

Fixed Tensor loading dedup by using std::set guard to prevent get_tensor() from creating duplicate entries in ctx_data.
Moved ClippableLinear clamp_info loading to occur after per-layer tensors.
Matched sliding window mask (24 positions) to PyTorch context_size.
Skipped Whisper normalization for Gemma4 mel output.

Affected Symbols

mtmd get_tensor()ClippableLinear