2025-02-v2

📅 Feb 20, 2025📦 unslothView on GitHub →

✨ 3 features🐛 5 fixes🔧 5 symbols

Summary

This release introduces GRPO, achieving up to 90% memory reduction during training, alongside various bug fixes and updates to support Llama 3.1 8B training.

Migration Steps

Update Unsloth via `pip install --upgrade --no-cache-dir unsloth unsloth_zoo` to utilize the new memory optimizations and features.

✨ New Features

Introduced GRPO (Gradient Reward Policy Optimization) resulting in up to 90% less memory usage during training compared to TRL + FA2.
All reward logs for individual reward functions will now be shown during GRPO training.
Added support for Llama 3.1 8B GRPO training.

🐛 Bug Fixes

Fixed bugs related to GRPO implementation.
Fixed Triton URL in README.md.
Fixed llama-quantize on WINDOWS WSL error related to GGUF saving in save.py.
Fixed an import error.
Fixed Gemma Mask conversion to float.

Affected Symbols

FastLanguageModel GRPOConfig GRPOTrainer vLLMSamplingParams save.py