Change8

Migrating to vLLM v0.20.0

Version v0.20.0 introduces 2 breaking changes. This guide details how to update your code.

Released: 4/23/2026

2
Breaking Changes
2
Migration Steps
10
Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

torch.compilerms_normPaddleOCR-VL image processorMistral YaRNEagle prefillOffloadingConnectorRayExecutorV2zentorchGDN attentionLustre FS

Breaking Changes

Issue #1

Default CUDA wheel switched to CUDA 13.0, following PyTorch's version policy. Users must ensure their environment matches this new default or explicitly build/install for a different CUDA version.

Issue #2

vLLM now ships on PyTorch 2.11 for CUDA environments. XPU environments temporarily remain on torch-xpu 2.10. This breaks environments relying on older PyTorch versions.

Migration Steps

  1. 1
    Ensure your environment uses PyTorch compatible with CUDA 13.0 if using the default CUDA build, or manage PyTorch versioning carefully if using XPU.
  2. 2
    If using features relying on older Transformers versions, be aware of potential compatibility shifts due to the move to support transformers>=5.

Release Summary

v0.20.0 introduces major infrastructure upgrades, including a default switch to CUDA 13.0 and PyTorch 2.11, alongside significant performance enhancements like TurboQuant 2-bit KV cache and the re-enabling of FlashAttention 4 as default prefill backend.

Need More Details?

View the full release notes and all changes for vLLM v0.20.0.

View Full Changelog