Change8

v0.17.1

📦 vllmView on GitHub →
🐛 6 fixes🔧 8 symbols

Summary

This patch release addresses several issues primarily related to TRTLLM MoE backends, Mamba/Qwen SSM caching, and MTP handling.

🐛 Bug Fixes

  • Fixed passing of activation_type to trtllm fused MoE NVFP4 and FP8.
  • Fixed/resupported nongated fused moe triton.
  • Re-enabled EP for trtllm MoE FP8 backend.
  • Zero freed SSM cache blocks on GPU for Mamba and Qwen3.5.
  • Fixed TRTLLM Block FP8 MoE Monolithic.
  • Optimized Indexer MTP handling for DSV3.2 and MTP.

Affected Symbols