b8885
📦 llama-cppView on GitHub →
✨ 5 features🐛 4 fixes🔧 4 symbols
Summary
This release introduces comprehensive support for the HunyuanVL vision-language model, including new architecture definitions and GGUF conversion capabilities. Several bug fixes address conversion issues and CI errors related to the new model integration.
Migration Steps
- When converting models, ensure HunyuanVLTextModel.__init__ receives an explicit `dir_model: Path` parameter if encountering type check errors during HF to GGUF conversion.
✨ New Features
- Added support for the HunyuanVL vision-language model.
- Implemented LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support.
- Introduced PROJECTOR_TYPE_HUNYUANVL utilizing the PatchMerger vision encoder.
- Added HunyuanVL-specific M-RoPE position encoding for image tokens.
- Enabled GGUF conversion for HunyuanVL vision and text models.
🐛 Bug Fixes
- Fixed HunyuanVL XD-RoPE hardware section order.
- Fixed conversion process for HunyuanOCR / HunyuanVL to GGUF, verified successful inference on Metal.
- Fixed -Werror=misleading-indentation warning in bilinear resize within the clip module.
- Resolved type check error in convert_hf_to_gguf.py by explicitly providing `dir_model: Path` to HunyuanVLTextModel.__init__.