Migrating to vLLM v0.16.0

Version v0.16.0 introduces 2 breaking changes. This guide details how to update your code.

Released: 2/13/2026

Breaking Changes

Migration Steps

Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

IPEXBitBlasMarlin 24torchrunModel Runner V2ConfigManagerKernelWrapperKernelRegistryPluggableLayerCascade AttentionTriton attentionFlashInferTRTLLMCUTLASSDeepGEMMwvSplitK_fp8AITER attention backendfused_add_rmsnorm_padKleidiAI INT4 dynamic quantNEON BFMMLA BF16 paged attentiontorch.compileMooncake connectorNIXL Connector V2EPLBCompressedTensorsW8A16Fp8ModelOpt MXFP8ScoreRequestDeepSeek ReasoningParserAuthorization header handling

Breaking Changes

●Issue #1

PyTorch 2.10 upgrade requires updating the environment dependency to use PyTorch 2.10 or newer.

●Issue #2

IPEX is deprecated in favor of vllm-xpu-kernels for XPU platform support.

Migration Steps

1
Update your environment to use PyTorch 2.10 or newer.
2
If using XPU platforms, replace configurations relying on IPEX with those using vllm-xpu-kernels.
3
Remove dependencies or configurations related to the removed BitBlas and Marlin 24 quantization backends.

Release Summary

vLLM v0.16.0 introduces full support for Async scheduling with Pipeline Parallelism, a new Realtime WebSocket API, and a major overhaul of XPU platform support by deprecating IPEX in favor of vllm-xpu-kernels. This release also includes extensive model support additions and performance optimizations across various hardware platforms.

Need More Details?

View the full release notes and all changes for vLLM v0.16.0.

View Full Changelog