b8762
📦 llama-cppView on GitHub →
✨ 4 features🐛 2 fixes🔧 1 symbols
Summary
This release introduces comprehensive support for the MERaLiON-2 multimodal audio model, including its specific architecture components and supported tasks. It also includes minor cleanups in the MERaLiON adaptor comments.
Migration Steps
- When generating the mmproj GGUF for MERaLiON-2, use convert_hf_to_gguf.py --mmproj on the full model directory (architecture: MERaLiON2ForConditionalGeneration).
- The decoder must be converted separately as a standard Gemma2 model after stripping the text_decoder weight prefix.
✨ New Features
- Added support for A*STAR's MERaLiON-2 multimodal audio-language model (3B and 10B) to the multimodal framework.
- MERaLiON-2 architecture includes Whisper large-v2 encoder for audio feature extraction, a Gated MLP adaptor, and Gemma2 3B / 27B decoder.
- Introduced new projector type: PROJECTOR_TYPE_MERALION.
- Supports tasks including speech transcription (EN/ZH/MS/TA), translation, and spoken QA for MERaLiON-2.
🐛 Bug Fixes
- Simplified comments in the meralion adaptor.
- Used format_tensor_name and ascii arrows in meralion comments.