Change8

v5.4.0

📦 transformersView on GitHub →
9 features🔧 9 symbols

Summary

This release introduces a significant number of new models across various domains, including video segmentation (VidEoMT), document processing (UVDoc, SLANeXt, PP-OCRv5 series), text embeddings (Jina-Embeddings-v3), large language models (Mistral 4), and robotics (PI0).

✨ New Features

  • Added VidEoMT, a lightweight encoder-only model for online video segmentation built on ViT.
  • Added UVDoc model for document image rectification and correction, supporting single input and batched inference.
  • Added Jina-Embeddings-v3, a multilingual, multi-task text embedding model based on XLM-RoBERTa supporting RoPE and task-specific LoRA adapters.
  • Added Mistral 4, a powerful hybrid MoE model unifying Instruct, Reasoning, and Devstral capabilities, supporting multimodal input and 256k context length.
  • Added PI0, a vision-language-action model for robotics manipulation using a flow matching architecture.
  • Added SLANeXt series of lightweight models for table structure recognition, with separate weights for wired and wireless tables.
  • Added PP-OCRv5_mobile_rec model for efficient, multi-language text recognition supporting complex scenarios like handwriting and vertical text.
  • Added PP-OCRv5_server_rec model for efficient, multi-language text recognition supporting complex scenarios like handwriting and vertical text.
  • Added PP-OCRv5_mobile_det model for efficient, multi-language text detection supporting diverse scenarios including handwriting, vertical, rotated, and curved text.

Affected Symbols