v3.3.5
Breaking Changes📦 tgiView on GitHub →
⚠ 1 breaking✨ 8 features🐛 5 fixes🔧 1 symbols
Summary
This release introduces significant hardware acceleration updates, including V2 Pydantic migration, XPU LoRA support, and various Gaudi optimizations for models like Gemma3 and Deepseek v2. It also bumps core dependencies like transformers and huggingface_hub.
⚠️ Breaking Changes
- Migration to V2 Pydantic interface may break configurations or code relying on the V1 interface. Review any custom Pydantic models or configuration loading logic.
Migration Steps
- If you were using Pydantic V1 interfaces, update your code to comply with the new V2 Pydantic interface.
✨ New Features
- Refined rope memory management on Gaudi, removing the need to keep sin/cos cache per layer.
- Added support for Gemma3 sliding window on Gaudi.
- Added xpu lora support.
- Updated Optimum Neuron dependency to version 0.3.0 (previously 0.2.2).
- Added Deepseek v2 mla and support for adding ep to unquantized moe on Gaudi.
- Added Hpu gptq gidx support.
- Added Xccl support.
- Bumped flake dependencies including transformers and huggingface_hub versions.
🐛 Bug Fixes
- Fixed CI test errors related to Gaudi.
- Fixed an issue where some GPTQ cases could not be handled by IPEX but can now be handled by TGI.
- Fixed outline import issue.
- Fixed crash issue for HuggingFaceM4/Idefics3-8B-Llama3 model.
- Fixed multi-modality related issues.
🔧 Affected Symbols
HeterogeneousNextTokenChooser