September-2025-v2
📦 unslothView on GitHub →
✨ 8 features🐛 12 fixes🔧 6 symbols
Summary
This release introduces major performance enhancements and new capabilities for Vision models in Reinforcement Learning (RL), alongside the new 'Standby' feature for memory-efficient training. Numerous bug fixes and improvements were also integrated across various components, including Intel/ROCm support and serialization workflows.
Migration Steps
- If you rely on GPU splitting for RL training/inference, consider adopting the new 'Unsloth Standby' feature to simplify setup and potentially improve performance.
✨ New Features
- Support for Vision models in Reinforcement Learning (RL) with models like Gemma 3 and Qwen2.5-VL, offering 1.5–2× speedup, 90% less VRAM usage, and 10× longer context lengths compared to FA2 setups.
- Introduction of Qwen's GSPO algorithm for RL.
- New RL feature called 'Standby' which eliminates the need to split the GPU between training and inference, minimizing speed degradation.
- Faster and more memory-efficient RL kernels and algorithms for text and vision LLMs, resulting in 50% less VRAM and 10× more context.
- Support for saving locally in `model.save_pretrained_torchao`.
- Support for QAT (Quantization Aware Training) full fine-tuning.
- Fast Inference with vLLM for VLMs.
- Support for modules_to_save in FastModel.get_peft_model.
🐛 Bug Fixes
- GPT OSS Bug fixes.
- Tests for mxfp4 and quantized models merge fix unsloth zoo pr 254.
- Update mistral.py to show flag to not call cut cross entropy.
- Fix incorrect function call in test_qwen3_grpo.py.
- [Intel] Make intel device support ROPE.
- Fixed save_pretrained_torchao and associated tests.
- Patch sftrainer to disable _is_vlm.
- Filter vllm executor log.
- Llama vision inference fix.
- GptAttention turn training off during inference.
- Simplify unsloth_base_fast_generate.
- [ROCm] Add hip device path.
🔧 Affected Symbols
mistral.pytest_qwen3_grpo.pymodel.save_pretrained_torchaoFastModel.get_peft_modelsftrainerGptAttention