Migrating to vLLM v0.23.0
Version v0.23.0 introduces 2 breaking changes. This guide details how to update your code.
Released: 6/12/2026
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
DeepSeek-V4DeepSeek-V3.2torch.compileLlamaMistralQwen3Gemma 4transformersMiniCPM-V/OSarvamVoxtralParser.parse()EagleKVConnectorNixlConnectorMooncakeLMCacheMPConnectorEC connectorMamba LINEAR attention-moduleKDA conv-stateVllmConfigKVCacheManagerCoordinatorBreaking Changes
●Issue #1
Support for Transformers v4 is deprecated; the library now targets Transformers v5. Users must update their dependencies and ensure compatibility with v5 features.
●Issue #2
The dedicated CUDA graph pool for Eagle has been removed (#44078). If you relied on specific pooling behavior for Eagle, you may need to adjust configurations.
Migration Steps
- 1Update dependencies to target Transformers v5, as v4 support is deprecated.
- 2Review usage of DeepSeek-V4 to ensure compatibility with decoupled sparse MLA metadata.
- 3If using speculative decoding, be aware of changes in lookahead-slot allocation and attention-group splitting.
- 4If using NixlConnector, plan migration away from the `kv_both` role.
Release Summary
v0.23.0 brings significant hardening and optimization for DeepSeek-V4, expands Model Runner V2 to Llama/Mistral models, and advances the experimental Rust frontend. This release also mandates compatibility with Transformers v5.
Need More Details?
View the full release notes and all changes for vLLM v0.23.0.
View Full Changelog