Change8

v0.19.1

📦 vllmView on GitHub →
2 features🐛 7 fixes🔧 2 symbols

Summary

This patch release upgrades to Transformers v5.5.4 and delivers numerous bug fixes specifically targeting Gemma4 streaming, tool calls, and model loading, alongside adding support for Gemma4 Eagle3 and quantized MoE.

Migration Steps

  1. Adjust requests to use the reasoning parser for Gemma4, as specified in the tool adjustments.

✨ New Features

  • Support quantized MoE for Gemma4.
  • Add Gemma4 Eagle3 support.

🐛 Bug Fixes

  • Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters.
  • Fix Gemma4 streaming HTML duplication after tool calls.
  • Fix Gemma4 streaming tool call corruption for split boolean/number values.
  • Fix Gemma4 tool parser converting bare null to string "null".
  • Fix Gemma 4 token repetition by dynamic BOS injection for PT models.
  • Resolve media_placeholder_token_id from tokenizer for kimi_k25.
  • Enable Gemma4ForCasualLM to load lora adapters correctly.

Affected Symbols