Change8

b8762

📦 llama-cppView on GitHub →
4 features🐛 2 fixes🔧 1 symbols

Summary

This release introduces comprehensive support for the MERaLiON-2 multimodal audio model, including its specific architecture components and supported tasks. It also includes minor cleanups in the MERaLiON adaptor comments.

Migration Steps

  1. When generating the mmproj GGUF for MERaLiON-2, use convert_hf_to_gguf.py --mmproj on the full model directory (architecture: MERaLiON2ForConditionalGeneration).
  2. The decoder must be converted separately as a standard Gemma2 model after stripping the text_decoder weight prefix.

✨ New Features

  • Added support for A*STAR's MERaLiON-2 multimodal audio-language model (3B and 10B) to the multimodal framework.
  • MERaLiON-2 architecture includes Whisper large-v2 encoder for audio feature extraction, a Gated MLP adaptor, and Gemma2 3B / 27B decoder.
  • Introduced new projector type: PROJECTOR_TYPE_MERALION.
  • Supports tasks including speech transcription (EN/ZH/MS/TA), translation, and spoken QA for MERaLiON-2.

🐛 Bug Fixes

  • Simplified comments in the meralion adaptor.
  • Used format_tensor_name and ascii arrows in meralion comments.

Affected Symbols