Migrating to vLLM v0.23.0

Version v0.23.0 introduces 2 breaking changes. This guide details how to update your code.

Released: 6/12/2026

Breaking Changes

Migration Steps

Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

DeepSeek-V4DeepSeek-V3.2torch.compileLlamaMistralQwen3Gemma 4transformersMiniCPM-V/OSarvamVoxtralParser.parse()EagleKVConnectorNixlConnectorMooncakeLMCacheMPConnectorEC connectorMamba LINEAR attention-moduleKDA conv-stateVllmConfigKVCacheManagerCoordinator

Breaking Changes

●Issue #1

Support for Transformers v4 is deprecated; the library now targets Transformers v5. Users must update their dependencies and ensure compatibility with v5 features.

●Issue #2

The dedicated CUDA graph pool for Eagle has been removed (#44078). If you relied on specific pooling behavior for Eagle, you may need to adjust configurations.

Migration Steps

1
Update dependencies to target Transformers v5, as v4 support is deprecated.
2
Review usage of DeepSeek-V4 to ensure compatibility with decoupled sparse MLA metadata.
3
If using speculative decoding, be aware of changes in lookahead-slot allocation and attention-group splitting.
4
If using NixlConnector, plan migration away from the `kv_both` role.

Release Summary

v0.23.0 brings significant hardening and optimization for DeepSeek-V4, expands Model Runner V2 to Llama/Mistral models, and advances the experimental Rust frontend. This release also mandates compatibility with Transformers v5.

Need More Details?

View the full release notes and all changes for vLLM v0.23.0.

View Full Changelog