August-2025

📅 Aug 8, 2025📦 unslothView on GitHub →

✨ 8 features🐛 19 fixes🔧 19 symbols

Summary

This release introduces broad support for the new gpt-oss model, enabling low-VRAM fine-tuning, alongside significant algorithmic updates that improve performance across all models. It also adds support for Qwen3 models and expands compatibility to include newer NVIDIA hardware like RTX 50 series and Blackwell GPUs.

Migration Steps

If using GRPO, be aware of potential changes related to argument matching in \`_get_per_token_logps\`.
If using Falcon H1, ensure inference compatibility as patches were applied.
If using LoRA, ensure \`lora_dropout\` is configured as a float, not an integer.
If encountering issues with Qwen2.5-VL-32B-Instruct quantized model merging, note that mappings were adjusted and reverted, check current behavior.
Users on Intel architectures might need to verify paths related to llama.py.
Users running inference on Llama or Gemma should verify functionality due to specific fixes.
Users utilizing multi-GPU setups should verify workload distribution.
Users relying on \`get_per_token_logps_and_entropies\` should update code to expect a tuple return instead of a dictionary.

✨ New Features

Introduction of support and fine-tuning capabilities for the new gpt-oss model, enabling training on as little as 14GB VRAM.
Algorithmic updates to Unsloth resulting in faster training and lower VRAM usage across all models.
Added support for training and inference on RTX 50 and Blackwell GPUs.
Ability to run Unsloth models directly via Docker using \`docker model pull hf.co/unsloth/gpt-oss-20b-GGUF\`.
Support for Qwen3-Coder and Qwen3-2507 models.
Added support for running and training Kimi-K2, GLM (4.5-Air, 4.5, 4-32B-0414), Orpheus-3B, and Hunyuan-A13B.
Added support for training Falcon-H1-7B and LFM2-1.2B.
Added support for Devstral-2507, Magistral-2507, and SmolLM3-3B.

🐛 Bug Fixes

Fixed argument mismatch in GRPO _get_per_token_logps lambda function.
Patched falcon h1 inference.
Fixed falcon H1 dropout issue.
Changed lora_dropout from int to float for type consistency.
Fixed dataloader_num_workers value error in GRPOTrainer for GRPO.
Fixed GRPO to support vllm pre-dequantized quantization states in fast_dequantize kernel.
Fixed Qwen2.5-VL-32B-Instruct mapping to resolve quantized model merge error (after initial addition and subsequent reverts).
Fixed casual mask issue.
Added Intel path for llama.py.
Fixed Gemma 2 issues.
Forced falcon h1 to use float32 when dtype is torch.float16.
Fixed torch compile issues.
Fixed Llama and Gemma inference.
Fixed multi GPU workload.
Fixed issues related to Model Loading.
Added gemma-3n chat template to chat_templates.py.
Added specific check for Gemma initialization to prevent BERT models from initializing incorrectly.
Fixed rope sync for all components.
Modified get_per_token_logps_and_entropies to return a tuple instead of a dict.

Affected Symbols

GRPO _get_per_token_logps falcon h1 inference lora_dropout GRPOTrainer dataloader_num_workers fast_dequantize kernel Qwen2.5-VL-32B-Instruct mapping casual mask llama.py (intel path)Gemma 2 falcon h1 dtype handling torch.compile Llama inference Gemma inference multi GPU workload Model Loading logic chat_templates.py (gemma-3n chat template)BERT initialization logic rope sync get_per_token_logps_and_entropies