Migrating to vLLM v0.24.0

Version v0.24.0 introduces 2 breaking changes. This guide details how to update your code.

Released: 6/29/2026

Breaking Changes

Migration Steps

Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

MiniMax-M3DeepSeek-V4Model Runner V2 (MRv2)GraniteMoEQwenDeepSeek-V2 MoEStreaming Parser EngineQwen3MiniMax-M2GLM-4.7GLM-5.1GLM-5.2Nemotron V3DiffusionGemmaDeepEP v2CUDA_VISIBLE_DEVICESdevice_idsGemma 4FlashAttention (FA4)Qwen3-VLQwen2-VLQwen2.5-VLQwen3.5GLM-4.1VDeepSeek-OCRKimi-VLmllama4Lfm2VLLlama4MiMo v2.xColQwen3.5EXAONE-4.5MiDashengLMMiniCPM-o/VCohere2 MoEColBERT AutoWeightsLoaderGLM-5NIXL EPP2pNcclConnectorMambaDeepGEMM

Breaking Changes

●Issue #1

vLLM no longer sets the internal environment variable `CUDA_VISIBLE_DEVICES`. Users must now explicitly specify target devices using the new `device_ids` argument when initializing the engine or API server.

●Issue #2

On ROCm platforms, the use of `CUDA_VISIBLE_DEVICES` is now deprecated, signaling a future removal. Users should transition to using the `device_ids` argument.

Migration Steps

1
Replace internal setting of `CUDA_VISIBLE_DEVICES` with the explicit `device_ids` argument when initializing vLLM components (e.g., `LLM(..., device_ids=[0, 1])`).

Release Summary

v0.24.0 introduces extensive support and performance optimizations for new models like MiniMax-M3 and DeepSeek-V4, matures the Model Runner V2 with default quantization support, and overhauls device selection by removing internal use of CUDA_VISIBLE_DEVICES.

Need More Details?

View the full release notes and all changes for vLLM v0.24.0.

View Full Changelog