TGI

AI & LLMs

Large Language Model Text Generation Inference

Latest: v3.3.715 releases2 breaking changes8 common errorsUpdated Dec 19, 2025View on GitHub

Release History

v3.3.71 fix1 feature

Dec 19, 2025

This release introduces support for limiting image fetching size and fixes an issue related to automatic device count computation. The system is also entering Maintenance mode.

v3.3.62 fixes

Sep 17, 2025

This patch release (v3.3.6) focuses primarily on bug fixes, including correcting an issue with flashinfer masking and removing Azure references, alongside minor documentation and code cleanup.

v3.3.5Breaking5 fixes8 features

Sep 2, 2025

This release introduces significant hardware acceleration updates, including V2 Pydantic migration, XPU LoRA support, and various Gaudi optimizations for models like Gemma3 and Deepseek v2. It also bumps core dependencies like transformers and huggingface_hub.

v3.3.41 fix2 features

Jun 19, 2025

This release introduces initial support for Gemma 3 models on Gaudi and fixes a bug related to Neuron models exported with batch_size 1.

v3.3.34 fixes1 feature

Jun 18, 2025

This release focuses on updating the Neuron backend, including bumping the SDK version and adding support for the Qwen3_moe model on Gaudi. Several Gaudi-specific fixes and performance optimizations were also implemented.

v3.3.23 fixes2 features

May 30, 2025

This release focuses on Gaudi improvements, including OOM fixes and new hardware support, alongside an upgrade to vllm extension operations and the addition of the Qwen3 model.

v3.3.12 fixes2 features

May 22, 2025

This release updates TGI to Torch 2.7 and CUDA 12.8, incorporating HPU warmup logic refinements, kernel updates, and bug fixes.

v3.3.015 fixes4 features

May 9, 2025

This release introduces prefill chunking for VLMs and includes numerous stability fixes across various hardware backends like Gaudi and L4. Key updates involve dependency bumps and specific model support enhancements.

v3.2.31 fix1 feature

Apr 8, 2025

This release introduces patching for Llama 4 and updates underlying dependencies like ROCM and transformers. It also includes a fix for a compute type typo.

v3.2.21 fix3 features

Apr 6, 2025

This release introduces support for the llama4 model, adds a configurable termination timeout, and includes several fixes, notably for Gaudi hardware.

v3.2.12 fixes2 features

Mar 18, 2025

This release introduces support for the Gemma 3 text model type and the official release of the Gaudi Backend. It also includes necessary updates for Triton kernel compilation and various bug fixes.

v3.2.0Breaking6 fixes3 features

Mar 12, 2025

This release introduces support for the Gemma 3 model and brings significant updates to tool calling behavior, aligning it more closely with OpenAI's specification, alongside various backend and model-specific bug fixes.

v3.1.114 fixes9 features

Mar 4, 2025

This release focuses on backend expansion, adding support for Llamacpp, Neuron, and Gaudi backends, alongside significant improvements to Qwen VL handling and template features. It also includes various stability fixes and dependency updates.

v3.1.04 fixes3 features

Jan 31, 2025

This release introduces full hardware support for Deepseek R1 on AMD and Nvidia, adds fp8 support for MoE models, and includes several stability fixes and dependency updates.

v3.0.214 fixes11 features

Jan 24, 2025

This release introduces a major new transformers backend supporting flashattention for unsupported models and adds support for several new models including Cohere2 and OLMo variants. Numerous bug fixes target specific model issues, VLM handling, and hardware acceleration improvements across CUDA, ROCm, and XPU platforms.

Common Errors

ModuleNotFoundError4 reports

The "ModuleNotFoundError" in TGI usually indicates that a required Python package is missing from your environment. To fix this, identify the missing module from the error message (e.g., 'punica_sgmv') and install it using pip: `pip install <missing_module_name>`. Alternatively, ensure you've installed TGI with `pip install --upgrade "hf-text-generation-inference[all]"` to include all optional dependencies.

NotImplementedError4 reports

NotImplementedError in TGI usually means a specific feature or model architecture hasn't been fully implemented in the code yet. To resolve this, check the TGI documentation or issue tracker for updates implementing the required functionality or workarounds. If support is absent, you may need to contribute the missing implementation yourself or wait for the TGI team to add support.

LocalEntryNotFoundError2 reports

This error usually arises in tgi due to missing or incorrectly installed dependencies, especially custom or third-party modules. To fix it, ensure all required packages, including those specified in `requirements.txt` or necessary for specific model functionalities (like `punica_sgmv`), are installed using `pip install -r requirements.txt` or `pip install <missing_package_name>`. If the issue pertains to a custom module, verify the module's path is correctly included in the Python environment using `sys.path.append`.

ZeroDivisionError1 report

The ZeroDivisionError occurs when dividing by zero. To fix it, check the denominator in your code where the error occurs and ensure it's never zero. Implement a conditional statement to handle cases where the denominator might be zero, either by assigning a default non-zero value or skipping the division operation altogether.

BadRequestError1 report

The "BadRequestError" in TGI often arises from inconsistencies between the client's request parameters and the API's expected input format, such as incorrect data types or missing required fields. Fix this by carefully reviewing the API documentation and ensuring that your client requests accurately match the expected schema, including validating the input data types, names, and presence of required parameters. Consider using a tool like Postman or a Python library with request validation capabilities during development to diagnose and rectify discrepancies prior to production deployment.

ConnectionResetError1 report

ConnectionResetError in tgi often arises when the server unexpectedly closes a connection while the client is still attempting to send or receive data, frequently due to timeouts or exceeding server limits. Implement keep-alive mechanisms on both client and server sides to maintain persistent connections, and adjust timeout settings within the tgi configuration (e.g., --max-concurrent-requests, --max-input-length) to accommodate the expected workload. Also, check for resource exhaustion on the server.

Related AI & LLMs Packages

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Ollama

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

LangChain

🦜🔗 The platform for reliable agents.

ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

llama.cpp

LLM inference in C/C++

GPT4All

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.