September-2025-v2

📅 Sep 16, 2025📦 unslothView on GitHub →

✨ 8 features🐛 12 fixes🔧 6 symbols

Summary

This release introduces major performance enhancements and new capabilities for Vision models in Reinforcement Learning (RL), alongside the new 'Standby' feature for memory-efficient training. Numerous bug fixes and improvements were also integrated across various components, including Intel/ROCm support and serialization workflows.

Migration Steps

If you rely on GPU splitting for RL training/inference, consider adopting the new 'Unsloth Standby' feature to simplify setup and potentially improve performance.

✨ New Features

Support for Vision models in Reinforcement Learning (RL) with models like Gemma 3 and Qwen2.5-VL, offering 1.5–2× speedup, 90% less VRAM usage, and 10× longer context lengths compared to FA2 setups.
Introduction of Qwen's GSPO algorithm for RL.
New RL feature called 'Standby' which eliminates the need to split the GPU between training and inference, minimizing speed degradation.
Faster and more memory-efficient RL kernels and algorithms for text and vision LLMs, resulting in 50% less VRAM and 10× more context.
Support for saving locally in `model.save_pretrained_torchao`.
Support for QAT (Quantization Aware Training) full fine-tuning.
Fast Inference with vLLM for VLMs.
Support for modules_to_save in FastModel.get_peft_model.

🐛 Bug Fixes

GPT OSS Bug fixes.
Tests for mxfp4 and quantized models merge fix unsloth zoo pr 254.
Update mistral.py to show flag to not call cut cross entropy.
Fix incorrect function call in test_qwen3_grpo.py.
[Intel] Make intel device support ROPE.
Fixed save_pretrained_torchao and associated tests.
Patch sftrainer to disable _is_vlm.
Filter vllm executor log.
Llama vision inference fix.
GptAttention turn training off during inference.
Simplify unsloth_base_fast_generate.
[ROCm] Add hip device path.

Affected Symbols

mistral.py test_qwen3_grpo.py model.save_pretrained_torchao FastModel.get_peft_model sftrainer GptAttention