Change8

4.55.0-GLM-4.5V-preview

📦 transformers
6 features🔧 2 symbols

Summary

This release introduces GLM-4.5V, a high-performance multimodal reasoning model based on GLM-4.5-Air, featuring advanced capabilities in image, video, and GUI analysis.

Migration Steps

  1. Install the specific release branch using: pip install transformers-v4.55.0-GLM-4.5V-preview

✨ New Features

  • Integration of GLM-4.5V, a multimodal reasoning model with 106B total and 12B active parameters.
  • Support for image reasoning including scene understanding and spatial recognition.
  • Support for video understanding including long video segmentation and event recognition.
  • Support for GUI tasks such as screen reading and desktop operation assistance.
  • Support for complex chart and long document parsing.
  • Support for grounding and precise visual element localization.

🔧 Affected Symbols

AutoProcessorGlm4vMoeForConditionalGeneration