Change8

2025-02-v2

📦 unsloth
3 features🐛 5 fixes🔧 5 symbols

Summary

This release introduces GRPO, achieving up to 90% memory reduction during training, alongside various bug fixes and updates to support Llama 3.1 8B training.

Migration Steps

  1. Update Unsloth via `pip install --upgrade --no-cache-dir unsloth unsloth_zoo` to utilize the new memory optimizations and features.

✨ New Features

  • Introduced GRPO (Gradient Reward Policy Optimization) resulting in up to 90% less memory usage during training compared to TRL + FA2.
  • All reward logs for individual reward functions will now be shown during GRPO training.
  • Added support for Llama 3.1 8B GRPO training.

🐛 Bug Fixes

  • Fixed bugs related to GRPO implementation.
  • Fixed Triton URL in README.md.
  • Fixed llama-quantize on WINDOWS WSL error related to GGUF saving in save.py.
  • Fixed an import error.
  • Fixed Gemma Mask conversion to float.

🔧 Affected Symbols

FastLanguageModelGRPOConfigGRPOTrainervLLMSamplingParamssave.py