Change8

September-2025-v3

Breaking Changes
📦 unslothView on GitHub →
1 breaking4 features🐛 4 fixes🔧 2 symbols

Summary

This release introduces significant performance gains and new capabilities for gpt-oss Reinforcement Learning, alongside support for new models like DeepSeek-V3.1-Terminus and Magistral 1.2. Several bug fixes were implemented, including resolving issues with BERT and QAT + LoRA fast path.

⚠️ Breaking Changes

  • Transformers inference code was rewritten to enable faster RL inference for gpt-oss, as it is not yet vLLM compatible. Users relying on the previous inference implementation for gpt-oss RL might see changes in behavior or performance characteristics.

Migration Steps

  1. Ensure you have the latest version of transformers installed to support fine-tuning Qwen3 models.
  2. Do NOT use Flash Attention 3 when training gpt-oss models, as it will result in incorrect training loss.

✨ New Features

  • Introduction of gpt-oss RL support, offering the fastest inference (~3x faster), lowest VRAM (50% less), and most context (8x longer) compared to other implementations.
  • Release of DeepSeek-V3.1-Terminus, available locally via GGUF.
  • Release of Magistral 1.2, available for local use or fine-tuning.
  • Support for fine-tuning new Qwen3 models including Qwen3-VL, Qwen3-Omni, and Qwen3-Next (requires latest transformers).

🐛 Bug Fixes

  • Fixed loading issues for BERT.
  • Fixed QAT + LoRA fast path.
  • Applied gemma3n embedder patch and adjusted FORCE_FLOAT32 match logic.
  • Fixed BERT functionality.

🔧 Affected Symbols

Transformers inference code (rewritten for gpt-oss RL)BERT