November-2025
📦 unslothView on GitHub →
✨ 11 features🐛 15 fixes🔧 7 symbols
Summary
This release introduces major performance enhancements with FP8 Reinforcement Learning support and significant VRAM reductions across the board. It also adds support for new models like DeepSeek-OCR and Qwen3-VL, alongside improved Docker integration.
Migration Steps
- Update Unsloth and the Docker image to use the latest updates.
- Update Unsloth via \`pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo\`.
- If you require PyTorch 2.9, use \`pip install --upgrade unsloth unsloth_zoo\`.
✨ New Features
- Introduction of FP8 Reinforcement Learning support, enabling training on any FP8 supported GPU with 1.4x speedup and 60% VRAM reduction.
- Fine-tuning support for DeepSeek-OCR models, resulting in an 89% improvement in language understanding.
- Support for Qwen3-VL models, including GGUFs for local execution.
- Integration with Docker for zero-setup local LLM execution via Unsloth Dynamic powered GGUFs (e.g., \`docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16\`).
- Support for Baidu ERNIE models.
- Support for SGLang inference engine.
- New guides available for LoRA Hot Swapping and vLLM Engine Arguments.
- Support for running Kimi-K2-Thinking models locally.
- Implementation of int64 kernels and fixes to RoPE embeddings to enable super ultra long context training.
- Enable FP8 + RL training for bf16 models.
- Addition of 128x128 PerBlock FP8 + RL support.
🐛 Bug Fixes
- Fixed excessive re-compilations for gpt-oss GRPO/RL on torch>=2.9.0.
- Reduced memory usage by 5 to 15% further for RL, GRPO via Sleep mode fixes.
- Fixed propagation of \`trust_remote_code = True\` argument.
- Fixed Unsloth offloaded gradient checkpointing failing to offload on the 1st step, reducing VRAM by >20%.
- Added \`logits.detach()\` to GRPO to solve double backwards on some pathways.
- Fixed OpenEnv gpt-oss RL notebook.
- Fixed DGX Spark docker image.
- Fixed Qwen3 VL gradient accumulation.
- Fixed LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds).
- Added \`trust_remote_code\` parameter to tokenizer.
- Fixed GRPO gradient accumulation issues.
- Fixed gpt oss memory calculation for intel device in unsloth_zoo.
- Fixed unbound local error tokenizer-model from cache in unsloth_zoo.
- Fixed Gemma3n in unsloth_zoo.
- Fixed rope_embedding AssertionError by checking kv_seq_len before reuse.
🔧 Affected Symbols
gpt-oss GRPORLDeepSeek-OCRQwen3-VLLlamaModel_fast_forwardRoPE embeddingstokenizer