August-2025-v2

📅 Aug 28, 2025📦 unslothView on GitHub →

✨ 7 features🐛 9 fixes🔧 6 symbols

Summary

This release introduces Unsloth Flex Attention for gpt-oss training, drastically improving context length, VRAM efficiency, and speed. Numerous bug fixes and support for new models/features like QAT + LoRA are also included.

Migration Steps

Update vLLM installation instructions for Blackwell if using the latest vLLM release.

✨ New Features

Introduced Unsloth Flex Attention support for OpenAI gpt-oss training, enabling >8× longer context lengths, >50% less VRAM usage, and >1.5× faster training.
Unsloth Flex Attention allows training with 60K context length on 80GB VRAM for BF16 LoRA.
Added ability to export/save QLoRA fine-tuned gpt-oss models to llama.cpp, vLLM, or HF.
Added support for Qwen3 Instruct / Thinking chat templates.
Added support for Qwen3 4B to mapper.py.
Added support for QAT + LoRA.
Allowed torch.float32 dtype in FastLanguageModel.

🐛 Bug Fixes

Fixed gpt-oss training losses going to infinity on float16 GPUs (like T4 Colab).
Fixed gpt-oss implementation issues, ensuring `swiglu_limit = 7.0` is properly applied during MXFP4 inference [in transformers](https://github.com/huggingface/transformers/pull/40197).
Fixed potential generator exhaustion bug in model loading file detection.
Fixed vision model GGUF quantization_method error type.
Fixed original_push_to_hub fallback.
Fixed extras transformers typo in pyproject.toml.
Fixed is casual setting for qwen3.
Fixed gemma-3n issues.
Handled transformers move to dtype from torch_dtype.

Affected Symbols

gpt-oss FastLanguageModel swiglu_limit Qwen3 Instruct Qwen3 4B gemma-3n