Change8

September-2025-v2

📦 unslothView on GitHub →
8 features🐛 12 fixes🔧 6 symbols

Summary

This release introduces major performance enhancements and new capabilities for Vision models in Reinforcement Learning (RL), alongside the new 'Standby' feature for memory-efficient training. Numerous bug fixes and improvements were also integrated across various components, including Intel/ROCm support and serialization workflows.

Migration Steps

  1. If you rely on GPU splitting for RL training/inference, consider adopting the new 'Unsloth Standby' feature to simplify setup and potentially improve performance.

✨ New Features

  • Support for Vision models in Reinforcement Learning (RL) with models like Gemma 3 and Qwen2.5-VL, offering 1.5–2× speedup, 90% less VRAM usage, and 10× longer context lengths compared to FA2 setups.
  • Introduction of Qwen's GSPO algorithm for RL.
  • New RL feature called 'Standby' which eliminates the need to split the GPU between training and inference, minimizing speed degradation.
  • Faster and more memory-efficient RL kernels and algorithms for text and vision LLMs, resulting in 50% less VRAM and 10× more context.
  • Support for saving locally in `model.save_pretrained_torchao`.
  • Support for QAT (Quantization Aware Training) full fine-tuning.
  • Fast Inference with vLLM for VLMs.
  • Support for modules_to_save in FastModel.get_peft_model.

🐛 Bug Fixes

  • GPT OSS Bug fixes.
  • Tests for mxfp4 and quantized models merge fix unsloth zoo pr 254.
  • Update mistral.py to show flag to not call cut cross entropy.
  • Fix incorrect function call in test_qwen3_grpo.py.
  • [Intel] Make intel device support ROPE.
  • Fixed save_pretrained_torchao and associated tests.
  • Patch sftrainer to disable _is_vlm.
  • Filter vllm executor log.
  • Llama vision inference fix.
  • GptAttention turn training off during inference.
  • Simplify unsloth_base_fast_generate.
  • [ROCm] Add hip device path.

🔧 Affected Symbols

mistral.pytest_qwen3_grpo.pymodel.save_pretrained_torchaoFastModel.get_peft_modelsftrainerGptAttention