Change8

November-2025

📦 unslothView on GitHub →
11 features🐛 15 fixes🔧 7 symbols

Summary

This release introduces major performance enhancements with FP8 Reinforcement Learning support and significant VRAM reductions across the board. It also adds support for new models like DeepSeek-OCR and Qwen3-VL, alongside improved Docker integration.

Migration Steps

  1. Update Unsloth and the Docker image to use the latest updates.
  2. Update Unsloth via \`pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo\`.
  3. If you require PyTorch 2.9, use \`pip install --upgrade unsloth unsloth_zoo\`.

✨ New Features

  • Introduction of FP8 Reinforcement Learning support, enabling training on any FP8 supported GPU with 1.4x speedup and 60% VRAM reduction.
  • Fine-tuning support for DeepSeek-OCR models, resulting in an 89% improvement in language understanding.
  • Support for Qwen3-VL models, including GGUFs for local execution.
  • Integration with Docker for zero-setup local LLM execution via Unsloth Dynamic powered GGUFs (e.g., \`docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16\`).
  • Support for Baidu ERNIE models.
  • Support for SGLang inference engine.
  • New guides available for LoRA Hot Swapping and vLLM Engine Arguments.
  • Support for running Kimi-K2-Thinking models locally.
  • Implementation of int64 kernels and fixes to RoPE embeddings to enable super ultra long context training.
  • Enable FP8 + RL training for bf16 models.
  • Addition of 128x128 PerBlock FP8 + RL support.

🐛 Bug Fixes

  • Fixed excessive re-compilations for gpt-oss GRPO/RL on torch>=2.9.0.
  • Reduced memory usage by 5 to 15% further for RL, GRPO via Sleep mode fixes.
  • Fixed propagation of \`trust_remote_code = True\` argument.
  • Fixed Unsloth offloaded gradient checkpointing failing to offload on the 1st step, reducing VRAM by >20%.
  • Added \`logits.detach()\` to GRPO to solve double backwards on some pathways.
  • Fixed OpenEnv gpt-oss RL notebook.
  • Fixed DGX Spark docker image.
  • Fixed Qwen3 VL gradient accumulation.
  • Fixed LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds).
  • Added \`trust_remote_code\` parameter to tokenizer.
  • Fixed GRPO gradient accumulation issues.
  • Fixed gpt oss memory calculation for intel device in unsloth_zoo.
  • Fixed unbound local error tokenizer-model from cache in unsloth_zoo.
  • Fixed Gemma3n in unsloth_zoo.
  • Fixed rope_embedding AssertionError by checking kv_seq_len before reuse.

🔧 Affected Symbols

gpt-oss GRPORLDeepSeek-OCRQwen3-VLLlamaModel_fast_forwardRoPE embeddingstokenizer