v3.3.1
📦 tgiView on GitHub →
✨ 2 features🐛 2 fixes🔧 3 symbols
Summary
This release updates TGI to Torch 2.7 and CUDA 12.8, incorporating HPU warmup logic refinements, kernel updates, and bug fixes.
Migration Steps
- Update to Torch 2.7.0 and Synapse AI 1.21.0 for optimal performance and compatibility.
✨ New Features
- Enable Llama4 for gaudi backend
- Switch to punica-sgmv kernel from the Hub
🐛 Bug Fixes
- Fix crash in default ATTENTION path
- Correctly count GPU UUIDs when NVIDIA_VISIBLE_DEVICES env is set to all
🔧 Affected Symbols
HPU warmup logicround_up_seq logicdefault ATTENTION path