Migrating to vLLM v0.18.0
Version v0.18.0 introduces 2 breaking changes. This guide details how to update your code.
Released: 3/20/2026
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
torch.compileCUBLAS_STATUS_INVALID_VALUEvllm.launchFlashInferOpenAI Responses APIWhisperModelStateXD-RoPENIXL-EPDeepSeek-V3.2Qwen3.5Qwen3-VLQwen3-NextMiniCPM-VMiniCPM-OQwen2.5-OmniQwen3-OmniDeepSeek-OCRLFM2SigLIP/CLIP Transformers v5FusedMoEFusedRMSNormGatedMamba2 SSDDeepEPLMCacheBreaking Changes
●Issue #1
Ray is no longer a default dependency. Users who rely on Ray for distributed execution must now install it explicitly.
●Issue #2
Cascade attention is disabled by default. If you relied on its previous default behavior, you may need to explicitly enable it.
Migration Steps
- 1If you previously encountered `CUBLAS_STATUS_INVALID_VALUE` in v0.17.0, reinstall `torch 2.10.0` as PyTorch published a fix.
- 2If you rely on Ray for distributed execution, install it explicitly (e.g., `pip install ray`).
- 3If you relied on Cascade attention being enabled by default, explicitly enable it if necessary.
Release Summary
v0.18.0 introduces major features like gRPC serving, GPU-less render serving, and significant improvements to KV cache offloading and Elastic Expert Parallelism. Ray is now an optional dependency, and numerous model-specific fixes and kernel optimizations have been integrated.
Need More Details?
View the full release notes and all changes for vLLM v0.18.0.
View Full Changelog