Unsloth

AI & LLMs

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.

Latest: February-202615 releases4 breaking changes20 common errorsUpdated Feb 10, 2026View on GitHub

Release History

February-2026Breaking33 fixes7 features

Feb 10, 2026

This release introduces major performance enhancements, including 12x faster MoE training and ultra-long context RL capabilities, alongside support for several new state-of-the-art models. Numerous bug fixes address stability across various architectures and dependencies.

December-2025Breaking11 fixes13 features

Dec 18, 2025

This release introduces massive performance gains with 3x faster training via new Triton kernels and enables 500K context length fine-tuning. It also adds support for Transformers v5, preliminary multi-GPU training, and several new model guides.

November-202515 fixes11 features

Nov 25, 2025

This release introduces major performance enhancements with FP8 Reinforcement Learning support and significant VRAM reductions across the board. It also adds support for new models like DeepSeek-OCR and Qwen3-VL, alongside improved Docker integration.

October-202524 fixes14 features

Oct 27, 2025

This release introduces major platform support, including Docker images and Blackwell/DGX hardware compatibility, alongside significant new features like Quantization-Aware Training (QAT) and extensive RL environment utilities.

September-2025-v3Breaking4 fixes4 features

Sep 26, 2025

This release introduces significant performance gains and new capabilities for gpt-oss Reinforcement Learning, alongside support for new models like DeepSeek-V3.1-Terminus and Magistral 1.2. Several bug fixes were implemented, including resolving issues with BERT and QAT + LoRA fast path.

September-2025-v212 fixes8 features

Sep 16, 2025

This release introduces major performance enhancements and new capabilities for Vision models in Reinforcement Learning (RL), alongside the new 'Standby' feature for memory-efficient training. Numerous bug fixes and improvements were also integrated across various components, including Intel/ROCm support and serialization workflows.

August-2025-v29 fixes7 features

Aug 28, 2025

This release introduces Unsloth Flex Attention for gpt-oss training, drastically improving context length, VRAM efficiency, and speed. Numerous bug fixes and support for new models/features like QAT + LoRA are also included.

August-202519 fixes8 features

Aug 8, 2025

This release introduces broad support for the new gpt-oss model, enabling low-VRAM fine-tuning, alongside significant algorithmic updates that improve performance across all models. It also adds support for Qwen3 models and expands compatibility to include newer NVIDIA hardware like RTX 50 series and Blackwell GPUs.

July-202518 fixes7 features

Jul 10, 2025

This release focuses heavily on stability, VRAM reduction (10-25% less), and broad model compatibility, including full fixes for Gemma 3N Vision and support for new models like Devstral 1.1 and MedGemma.

June-2025Breaking23 fixes16 features

Jun 26, 2025

This release introduces major new capabilities including support for multimodal Gemma 3n models and Text-to-Speech fine-tuning, alongside new quantization methods (Dynamic 2.0 GGUFs) and support for DeepSeek-R1-0528 and Magistral-24B.

May-202512 fixes5 features

May 2, 2025

This release introduces official support for Qwen3 models, including fine-tuning capabilities for the 30B MoE variant. Numerous bug fixes address compatibility issues, quantization errors, and improve overall stability.

2025-037 fixes15 features

Mar 14, 2025

The March release introduces full support for finetuning Gemma 3 models and significantly expands model compatibility, including Mixtral and vision models, alongside preliminary support for 8bit and full finetuning. This version also brings Windows support and removes the compilation requirement for GGUF exports.

2025-02-v25 fixes3 features

Feb 20, 2025

This release introduces GRPO, achieving up to 90% memory reduction during training, alongside various bug fixes and updates to support Llama 3.1 8B training.

2025-029 fixes7 features

Feb 6, 2025

This release introduces major support for GRPO training, enabling LoRA/QLoRA for GRPO across various models, and integrates fast inference via vLLM for significant throughput gains. Numerous bug fixes address issues with Gemma 2, Mistral mapping, and general stability.

2025-0114 fixes6 features

Jan 10, 2025

This release introduces full support for the Phi-4 model, including fixes for tokenization and chat templates, alongside significant bug fixes for gradient accumulation, vision models, and performance regressions. It also brings Windows support and performance improvements via updated Xformers.

Common Errors

OutOfMemoryError5 reports

OutOfMemoryError typically arises when the model and training data exceed available GPU memory. Reduce batch size, use gradient accumulation, enable CPU offloading via `unsloth.utils.force_low_cpu_mem` if possible, or switch to a smaller model to decrease memory footprint and prevent the error. Consider upgrading your GPU or using distributed training across multiple GPUs if the problem persists.

TorchRuntimeError3 reports

TorchRuntimeError in Unsloth often arises from incorrect tensor shapes, mismatched data types or unexpected values during operations, particularly in custom CUDA kernels. Ensure input tensors to Unsloth's optimized functions have the expected dimensions and data types as defined in the function signatures or kernel implementations. Verify that no NaN or Inf values are present in the tensors as these can propagate and cause failures inside kernels.

CalledProcessError3 reports

"CalledProcessError" in unsloth often arises when external processes invoked during GGUF conversion, like `llama.cpp` functions, fail due to insufficient system resources (RAM or disk space) or incorrect file paths. To fix this, ensure you have ample free RAM and disk space before converting, and carefully double-check that all file paths specified in your arguments, especially the model path, exist and are correct. Try reducing batch sizes or the number of threads used to lower memory requirements if resource limitations persist.

ChildFailedError2 reports

ChildFailedError in Unsloth with multi-GPU setups usually arises from issues during the synchronization of model parameters across GPUs in the distributed training process, specifically when using `torch.distributed.launch`. To fix it, try using `torchrun` instead of `torch.distributed.launch` to launch your training script. Ensure all GPUs are visible and properly configured via `CUDA_VISIBLE_DEVICES` environment variable prior to launching the process.

ModuleNotFoundError2 reports

The "ModuleNotFoundError" in Unsloth usually arises from missing required packages not installed alongside the core library, or from incorrect installation paths. Resolve this by first ensuring you've installed Unsloth via pip (`pip install unsloth`) including any necessary dependencies using `pip install "unsloth[extra_dependencies]"`, replacing `extra_dependencies` with the actual relevant dependency group if available (e.g., `deepspeed`, `gpu_monitoring`). If issues persist, double-check your Python environment and PYTHONPATH variable to ensure Unsloth's installation directory is correctly included.

ArgsMismatchError2 reports

ArgsMismatchError in Unsloth often stems from inconsistencies between the expected function/layer arguments and the actual arguments supplied, especially after LoRA application. To fix this, ensure your LoRA configuration aligns perfectly with the base model's architecture; carefully check parameter names and shapes during LoRA application, and ensure they match what the model expects. Double-check the arguments passed to the model during inference against the model definition in its configuration file.

Related AI & LLMs Packages

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Ollama

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

LangChain

🦜🔗 The platform for reliable agents.

ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

llama.cpp

LLM inference in C/C++

GPT4All

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.