Change8

Migrating to vLLM v0.15.0

Version v0.15.0 introduces 2 breaking changes. This guide details how to update your code.

Released: 1/29/2026

2
Breaking Changes
4
Migration Steps
18
Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

vllm:time_per_output_token_secondsvllm:inter_token_latency_secondsStreamingInputtorch.compileFlashInfer MLATRTLLMDeepGEMMCpuCommunicatorEPLBUCX_MEM_MMAP_HOOK_MODEDeepSpeedFp8RTNHQQFlashAttnSiluAndMulQuantFP8 CustomOpAiterFlashAttentionBackendllmcompressor

Breaking Changes

Issue #1

Removed deprecated metric `vllm:time_per_output_token_seconds`; users must switch to using `vllm:inter_token_latency_seconds` instead.

Issue #2

Removed deprecated environment variables.

Migration Steps

  1. 1
    Replace usage of the deprecated metric `vllm:time_per_output_token_seconds` with `vllm:inter_token_latency_seconds`.
  2. 2
    If using DeepSpeedFp8 quantization, migrate to an alternative method as it has been removed.
  3. 3
    If using RTN quantization, migrate to an alternative method as it has been removed.
  4. 4
    If using HQQ quantization, migrate to an alternative method as it is now deprecated.

Release Summary

This release introduces extensive model support, significant performance enhancements across NVIDIA and AMD hardware (especially for MoE and FP4), and new API features like session-based streaming input. Several deprecated metrics and quantization methods have been removed.

Need More Details?

View the full release notes and all changes for vLLM v0.15.0.

View Full Changelog