Change8

Migrating to vLLM v0.18.0

Version v0.18.0 introduces 2 breaking changes. This guide details how to update your code.

Released: 3/20/2026

2
Breaking Changes
3
Migration Steps
24
Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

torch.compileCUBLAS_STATUS_INVALID_VALUEvllm.launchFlashInferOpenAI Responses APIWhisperModelStateXD-RoPENIXL-EPDeepSeek-V3.2Qwen3.5Qwen3-VLQwen3-NextMiniCPM-VMiniCPM-OQwen2.5-OmniQwen3-OmniDeepSeek-OCRLFM2SigLIP/CLIP Transformers v5FusedMoEFusedRMSNormGatedMamba2 SSDDeepEPLMCache

Breaking Changes

Issue #1

Ray is no longer a default dependency. Users who rely on Ray for distributed execution must now install it explicitly.

Issue #2

Cascade attention is disabled by default. If you relied on its previous default behavior, you may need to explicitly enable it.

Migration Steps

  1. 1
    If you previously encountered `CUBLAS_STATUS_INVALID_VALUE` in v0.17.0, reinstall `torch 2.10.0` as PyTorch published a fix.
  2. 2
    If you rely on Ray for distributed execution, install it explicitly (e.g., `pip install ray`).
  3. 3
    If you relied on Cascade attention being enabled by default, explicitly enable it if necessary.

Release Summary

v0.18.0 introduces major features like gRPC serving, GPU-less render serving, and significant improvements to KV cache offloading and Elastic Expert Parallelism. Ray is now an optional dependency, and numerous model-specific fixes and kernel optimizations have been integrated.

Need More Details?

View the full release notes and all changes for vLLM v0.18.0.

View Full Changelog