v0.11.2
📦 vllmView on GitHub →
🐛 4 fixes🔧 5 symbols
Summary
This release provides four critical bug fixes addressing Ray multi-node clusters, speculative decoding assertions, and FlashAttention MLA scheduling.
🐛 Bug Fixes
- Fixed Ray multi-node support issues
- Fixed false assertion when using speculative decoding with values [2, 4, ...] and Tensor Parallelism > 2
- Fixed compatibility between async-scheduling and FlashAttention MLA
- Guarded SM100 CUTLASS MoE macro to specific SM100 builds
🔧 Affected Symbols
Rayspec-decodeFlashAttn MLAasync-schedulingCUTLASS MoE