b8766
📦 llama-cppView on GitHub →
✨ 6 features🐛 4 fixes🔧 3 symbols
Summary
This release introduces support for the Gemma 4 audio conformer encoder, detailing its specific architecture and preprocessing steps. Several internal fixes were implemented related to tensor loading and mask matching.
✨ New Features
- Added support for Gemma 4 audio conformer encoder via an USM-style Conformer.
- Implemented 12-layer Conformer architecture for Gemma 4 audio processing.
- Added Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm.
- Implemented full self-attention with sinusoidal RPE and sliding window mask (24 positions).
- Added logit softcapping at 50.0 and ClippableLinear clamping.
- Introduced dedicated mel preprocessing via mtmd_audio_preprocessor_gemma4a using HTK mel scale, 128 bins, magnitude STFT, and mel_floor=1e-3.
🐛 Bug Fixes
- Fixed Tensor loading dedup by using std::set guard to prevent get_tensor() from creating duplicate entries in ctx_data.
- Moved ClippableLinear clamp_info loading to occur after per-layer tensors.
- Matched sliding window mask (24 positions) to PyTorch context_size.
- Skipped Whisper normalization for Gemma4 mel output.