Migrating to vLLM v0.24.0
Version v0.24.0 introduces 2 breaking changes. This guide details how to update your code.
Released: 6/29/2026
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
MiniMax-M3DeepSeek-V4Model Runner V2 (MRv2)GraniteMoEQwenDeepSeek-V2 MoEStreaming Parser EngineQwen3MiniMax-M2GLM-4.7GLM-5.1GLM-5.2Nemotron V3DiffusionGemmaDeepEP v2CUDA_VISIBLE_DEVICESdevice_idsGemma 4FlashAttention (FA4)Qwen3-VLQwen2-VLQwen2.5-VLQwen3.5GLM-4.1VDeepSeek-OCRKimi-VLmllama4Lfm2VLLlama4MiMo v2.xColQwen3.5EXAONE-4.5MiDashengLMMiniCPM-o/VCohere2 MoEColBERT AutoWeightsLoaderGLM-5NIXL EPP2pNcclConnectorMambaDeepGEMMBreaking Changes
●Issue #1
vLLM no longer sets the internal environment variable `CUDA_VISIBLE_DEVICES`. Users must now explicitly specify target devices using the new `device_ids` argument when initializing the engine or API server.
●Issue #2
On ROCm platforms, the use of `CUDA_VISIBLE_DEVICES` is now deprecated, signaling a future removal. Users should transition to using the `device_ids` argument.
Migration Steps
- 1Replace internal setting of `CUDA_VISIBLE_DEVICES` with the explicit `device_ids` argument when initializing vLLM components (e.g., `LLM(..., device_ids=[0, 1])`).
Release Summary
v0.24.0 introduces extensive support and performance optimizations for new models like MiniMax-M3 and DeepSeek-V4, matures the Model Runner V2 with default quantization support, and overhauls device selection by removing internal use of CUDA_VISIBLE_DEVICES.
Need More Details?
View the full release notes and all changes for vLLM v0.24.0.
View Full Changelog