Migrating to vLLM v0.15.0
Version v0.15.0 introduces 2 breaking changes. This guide details how to update your code.
Released: 1/29/2026
2
Breaking Changes
4
Migration Steps
18
Affected Symbols
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
vllm:time_per_output_token_secondsvllm:inter_token_latency_secondsStreamingInputtorch.compileFlashInfer MLATRTLLMDeepGEMMCpuCommunicatorEPLBUCX_MEM_MMAP_HOOK_MODEDeepSpeedFp8RTNHQQFlashAttnSiluAndMulQuantFP8 CustomOpAiterFlashAttentionBackendllmcompressorBreaking Changes
●Issue #1
Removed deprecated metric `vllm:time_per_output_token_seconds`; users must switch to using `vllm:inter_token_latency_seconds` instead.
●Issue #2
Removed deprecated environment variables.
Migration Steps
- 1Replace usage of the deprecated metric `vllm:time_per_output_token_seconds` with `vllm:inter_token_latency_seconds`.
- 2If using DeepSpeedFp8 quantization, migrate to an alternative method as it has been removed.
- 3If using RTN quantization, migrate to an alternative method as it has been removed.
- 4If using HQQ quantization, migrate to an alternative method as it is now deprecated.
Release Summary
This release introduces extensive model support, significant performance enhancements across NVIDIA and AMD hardware (especially for MoE and FP4), and new API features like session-based streaming input. Several deprecated metrics and quantization methods have been removed.
Need More Details?
View the full release notes and all changes for vLLM v0.15.0.
View Full Changelog