v0.17.1
📦 vllmView on GitHub →
🐛 6 fixes🔧 8 symbols
Summary
This patch release addresses several issues primarily related to TRTLLM MoE backends, Mamba/Qwen SSM caching, and MTP handling.
🐛 Bug Fixes
- Fixed passing of activation_type to trtllm fused MoE NVFP4 and FP8.
- Fixed/resupported nongated fused moe triton.
- Re-enabled EP for trtllm MoE FP8 backend.
- Zero freed SSM cache blocks on GPU for Mamba and Qwen3.5.
- Fixed TRTLLM Block FP8 MoE Monolithic.
- Optimized Indexer MTP handling for DSV3.2 and MTP.