November-2025

📅 Nov 25, 2025📦 unslothView on GitHub →

✨ 11 features🐛 15 fixes🔧 7 symbols

Summary

This release introduces major performance enhancements with FP8 Reinforcement Learning support and significant VRAM reductions across the board. It also adds support for new models like DeepSeek-OCR and Qwen3-VL, alongside improved Docker integration.

Migration Steps

Update Unsloth and the Docker image to use the latest updates.
Update Unsloth via \`pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo\`.
If you require PyTorch 2.9, use \`pip install --upgrade unsloth unsloth_zoo\`.

✨ New Features

Introduction of FP8 Reinforcement Learning support, enabling training on any FP8 supported GPU with 1.4x speedup and 60% VRAM reduction.
Fine-tuning support for DeepSeek-OCR models, resulting in an 89% improvement in language understanding.
Support for Qwen3-VL models, including GGUFs for local execution.
Integration with Docker for zero-setup local LLM execution via Unsloth Dynamic powered GGUFs (e.g., \`docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16\`).
Support for Baidu ERNIE models.
Support for SGLang inference engine.
New guides available for LoRA Hot Swapping and vLLM Engine Arguments.
Support for running Kimi-K2-Thinking models locally.
Implementation of int64 kernels and fixes to RoPE embeddings to enable super ultra long context training.
Enable FP8 + RL training for bf16 models.
Addition of 128x128 PerBlock FP8 + RL support.

🐛 Bug Fixes

Fixed excessive re-compilations for gpt-oss GRPO/RL on torch>=2.9.0.
Reduced memory usage by 5 to 15% further for RL, GRPO via Sleep mode fixes.
Fixed propagation of \`trust_remote_code = True\` argument.
Fixed Unsloth offloaded gradient checkpointing failing to offload on the 1st step, reducing VRAM by >20%.
Added \`logits.detach()\` to GRPO to solve double backwards on some pathways.
Fixed OpenEnv gpt-oss RL notebook.
Fixed DGX Spark docker image.
Fixed Qwen3 VL gradient accumulation.
Fixed LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds).
Added \`trust_remote_code\` parameter to tokenizer.
Fixed GRPO gradient accumulation issues.
Fixed gpt oss memory calculation for intel device in unsloth_zoo.
Fixed unbound local error tokenizer-model from cache in unsloth_zoo.
Fixed Gemma3n in unsloth_zoo.
Fixed rope_embedding AssertionError by checking kv_seq_len before reuse.

Affected Symbols

gpt-oss GRPO RL DeepSeek-OCR Qwen3-VL LlamaModel_fast_forward RoPE embeddings tokenizer