v5.5.2
📦 transformersView on GitHub →
✨ 1 features🐛 2 fixes🔧 1 symbols
Summary
This patch optimizes Gemma4 by adding MoE support and fixing inference issues related to k/v state sharing when caching is disabled. It also corrects weight serialization mappings for VLMs.
✨ New Features
- Added support for Mixture of Experts (MoE) in Gemma4 Tensor Parallelism (TP) plan.
🐛 Bug Fixes
- Fixed inference issue with `use_cache=False` in Gemma4 related to k/v states sharing between layers.
- Fixed inconsistent serialization of weight names during conversion mappings for some vision-language models (VLMs).