Change8

b8885

📦 llama-cppView on GitHub →
5 features🐛 4 fixes🔧 4 symbols

Summary

This release introduces comprehensive support for the HunyuanVL vision-language model, including new architecture definitions and GGUF conversion capabilities. Several bug fixes address conversion issues and CI errors related to the new model integration.

Migration Steps

  1. When converting models, ensure HunyuanVLTextModel.__init__ receives an explicit `dir_model: Path` parameter if encountering type check errors during HF to GGUF conversion.

✨ New Features

  • Added support for the HunyuanVL vision-language model.
  • Implemented LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support.
  • Introduced PROJECTOR_TYPE_HUNYUANVL utilizing the PatchMerger vision encoder.
  • Added HunyuanVL-specific M-RoPE position encoding for image tokens.
  • Enabled GGUF conversion for HunyuanVL vision and text models.

🐛 Bug Fixes

  • Fixed HunyuanVL XD-RoPE hardware section order.
  • Fixed conversion process for HunyuanOCR / HunyuanVL to GGUF, verified successful inference on Metal.
  • Fixed -Werror=misleading-indentation warning in bilinear resize within the clip module.
  • Resolved type check error in convert_hf_to_gguf.py by explicitly providing `dir_model: Path` to HunyuanVLTextModel.__init__.

Affected Symbols