4.55.0-GLM-4.5V-preview
📦 transformers
✨ 6 features🔧 2 symbols
Summary
This release introduces GLM-4.5V, a high-performance multimodal reasoning model based on GLM-4.5-Air, featuring advanced capabilities in image, video, and GUI analysis.
Migration Steps
- Install the specific release branch using: pip install transformers-v4.55.0-GLM-4.5V-preview
✨ New Features
- Integration of GLM-4.5V, a multimodal reasoning model with 106B total and 12B active parameters.
- Support for image reasoning including scene understanding and spatial recognition.
- Support for video understanding including long video segmentation and event recognition.
- Support for GUI tasks such as screen reading and desktop operation assistance.
- Support for complex chart and long document parsing.
- Support for grounding and precise visual element localization.
🔧 Affected Symbols
AutoProcessorGlm4vMoeForConditionalGeneration