Change8

v0.11.2

📦 vllmView on GitHub →
🐛 4 fixes🔧 5 symbols

Summary

This release provides four critical bug fixes addressing Ray multi-node clusters, speculative decoding assertions, and FlashAttention MLA scheduling.

🐛 Bug Fixes

  • Fixed Ray multi-node support issues
  • Fixed false assertion when using speculative decoding with values [2, 4, ...] and Tensor Parallelism > 2
  • Fixed compatibility between async-scheduling and FlashAttention MLA
  • Guarded SM100 CUTLASS MoE macro to specific SM100 builds

🔧 Affected Symbols

Rayspec-decodeFlashAttn MLAasync-schedulingCUTLASS MoE