v0.11.2

📅 Nov 20, 2025📦 vllmView on GitHub →

🐛 4 fixes🔧 5 symbols

Summary

This release provides four critical bug fixes addressing Ray multi-node clusters, speculative decoding assertions, and FlashAttention MLA scheduling.

Fixed Ray multi-node support issues
Fixed false assertion when using speculative decoding with values [2, 4, ...] and Tensor Parallelism > 2
Fixed compatibility between async-scheduling and FlashAttention MLA
Guarded SM100 CUTLASS MoE macro to specific SM100 builds

Rayspec-decodeFlashAttn MLAasync-schedulingCUTLASS MoE