2025-02-v2
📦 unsloth
✨ 3 features🐛 5 fixes🔧 5 symbols
Summary
This release introduces GRPO, achieving up to 90% memory reduction during training, alongside various bug fixes and updates to support Llama 3.1 8B training.
Migration Steps
- Update Unsloth via `pip install --upgrade --no-cache-dir unsloth unsloth_zoo` to utilize the new memory optimizations and features.
✨ New Features
- Introduced GRPO (Gradient Reward Policy Optimization) resulting in up to 90% less memory usage during training compared to TRL + FA2.
- All reward logs for individual reward functions will now be shown during GRPO training.
- Added support for Llama 3.1 8B GRPO training.
🐛 Bug Fixes
- Fixed bugs related to GRPO implementation.
- Fixed Triton URL in README.md.
- Fixed llama-quantize on WINDOWS WSL error related to GGUF saving in save.py.
- Fixed an import error.
- Fixed Gemma Mask conversion to float.
🔧 Affected Symbols
FastLanguageModelGRPOConfigGRPOTrainervLLMSamplingParamssave.py