Migrating to vLLM v0.20.0
Version v0.20.0 introduces 2 breaking changes. This guide details how to update your code.
Released: 4/23/2026
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
torch.compilerms_normPaddleOCR-VL image processorMistral YaRNEagle prefillOffloadingConnectorRayExecutorV2zentorchGDN attentionLustre FSBreaking Changes
●Issue #1
Default CUDA wheel switched to CUDA 13.0, following PyTorch's version policy. Users must ensure their environment matches this new default or explicitly build/install for a different CUDA version.
●Issue #2
vLLM now ships on PyTorch 2.11 for CUDA environments. XPU environments temporarily remain on torch-xpu 2.10. This breaks environments relying on older PyTorch versions.
Migration Steps
- 1Ensure your environment uses PyTorch compatible with CUDA 13.0 if using the default CUDA build, or manage PyTorch versioning carefully if using XPU.
- 2If using features relying on older Transformers versions, be aware of potential compatibility shifts due to the move to support transformers>=5.
Release Summary
v0.20.0 introduces major infrastructure upgrades, including a default switch to CUDA 13.0 and PyTorch 2.11, alongside significant performance enhancements like TurboQuant 2-bit KV cache and the re-enabling of FlashAttention 4 as default prefill backend.
Need More Details?
View the full release notes and all changes for vLLM v0.20.0.
View Full Changelog