v0.10.1rc1
Breaking Changes📦 vllmView on GitHub →
⚠ 1 breaking✨ 11 features🐛 10 fixes⚡ 1 deprecations🔧 8 symbols
Summary
This release introduces model loader plugins, official Emu3 support, and significant performance optimizations for MoE kernels and FlashInfer. It also includes critical bug fixes for TPU, ROCm, and various quantization backends.
⚠️ Breaking Changes
- The command line argument '--expand-tools-even-if-tool-choice-none' has been replaced with '--exclude-tools-when-tool-choice-none'. Users must update their startup scripts to use the new flag.
Migration Steps
- Update deployment scripts: replace '--expand-tools-even-if-tool-choice-none' with '--exclude-tools-when-tool-choice-none'.
- Upgrade flashinfer to v0.2.9rc1.
✨ New Features
- Support for model loader plugins.
- Official support for Emu3 with Transformers backend.
- Support for custom naming of vLLM processes.
- Support for CPU Transfer in NixlConnector.
- Added support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B.
- Added support for Prithvi in Online serving mode.
- Support for tensor parallel for timm ViT in Deepseek_vl2.
- Enable BitsAndBytes (BNB) support for more MoE models.
- Added request_id to the Request object for better external load balancer control.
- Enable FP8 KV cache on ROCm AITER backend.
- Added fused_moe configs for Granite4 and Qwen3-Coder-480B-A35B-Instruct.
🐛 Bug Fixes
- Fix for warp_size uses on host in ROCm.
- Fix MoE layer and OOM issues on TPU backend.
- Fix duplicate FusedMoEConfig debug messages.
- Fix CUDA arch flags for MoE permute.
- Fix Compressed Tensor NVFP4 illegal memory access in cutlass_fp4_group_mm.
- Fix DeepGemm initialization and hardcoded type-cast errors.
- Fix GLM-4 pipeline parallelism missing layer issue.
- Fix GGUF AttributeError related to PosixPath startswith.
- Fix modelscope snapshot_download serialization.
- Fix logprobs op to support more backends.
🔧 Affected Symbols
SpecializedManagerFusedMoEConfigDPEngineCoreActor._set_cuda_visible_devicesFlashInfer.MetadataBuilderNixlConnectorPhi3VImagePixelInputsGemma3.vision_embeddingsMamba2.RMSNorm⚡ Deprecations
- Deprecated '--expand-tools-even-if-tool-choice-none' in favor of '--exclude-tools-when-tool-choice-none'.