Change8

v5.5.2

📦 transformersView on GitHub →
1 features🐛 2 fixes🔧 1 symbols

Summary

This patch optimizes Gemma4 by adding MoE support and fixing inference issues related to k/v state sharing when caching is disabled. It also corrects weight serialization mappings for VLMs.

✨ New Features

  • Added support for Mixture of Experts (MoE) in Gemma4 Tensor Parallelism (TP) plan.

🐛 Bug Fixes

  • Fixed inference issue with `use_cache=False` in Gemma4 related to k/v states sharing between layers.
  • Fixed inconsistent serialization of weight names during conversion mappings for some vision-language models (VLMs).

Affected Symbols