Error3 reports

Fix `OutOfMemoryError`

in vLLM

✅ Solution

This error typically occurs in vllm. Check the example issues for common solutions.

Related Issues

Real GitHub issues where developers encountered this error:

[Bug]: GLM-5 FP8 OOM for long inputs at `flash_mla_cuda.sparse_decode_fwd` on H200Jun 4, 2026

[Bug]: Online FP8 (`--quantization fp8`) over-allocates non-gated MoE `w13` (2×intermediate), causing OOM — NemotronH on a single GPUJun 4, 2026

[Bug]: --load-format runai_streamer retains ~the full checkpoint in host RAM on every TP worker under the Ray executor → host-OOM for large models; regression from #43464Jun 3, 2026

Timeline

First reported:Jun 3, 2026

Last reported:Jun 4, 2026

Need More Help?

View the full changelog and migration guides for vLLM

View vLLM Changelog