v0.16.0

Breaking Changes

📅 Jul 3, 2025📦 peftView on GitHub →

⚠ 4 breaking✨ 8 features🐛 7 fixes⚡ 1 deprecations🔧 6 symbols

Summary

This release introduces three major new PEFT methods: LoRA-FA, RandLoRA, and C3A, alongside significant enhancements like QLoRA support and broader layer compatibility for LoRA and DoRA. It also includes critical compatibility updates related to recent changes in the Hugging Face Transformers library.

⚠️ Breaking Changes

Orthogonal Finetuning (OFT) refactor makes it incompatible with old OFT checkpoints. Users must either pin PEFT version to <0.16.0 or retrain checkpoints using the new PEFT version.
Due to a Transformers VLM refactor, PEFT prompt learning methods applied to `vlm.language_model` will no longer work; apply them directly to `vlm` instead.
Loading checkpoints trained after the Transformers VLM refactor is not possible with PEFT versions before this release; upgrade both PEFT and transformers.
Prefix tuning may show numerical differences due to attention mask refactors in Transformers. If performance is degraded, especially for models using 4d attention masks like Gemma, re-train checkpoints or pin PEFT to <0.16.0 and transformers to <4.52.0.

Migration Steps

If using OFT checkpoints trained with older PEFT versions, either pin PEFT to `<0.16.0` or retrain the checkpoints using this new PEFT version.
If using PEFT prompt learning methods on Vision Language Models (VLMs) from Transformers, change the target from `vlm.language_model` to `vlm` directly.
If using prompt learning methods (especially prefix tuning) and observing numerical differences or performance issues after upgrading Transformers, re-train the affected checkpoints or pin PEFT to `<0.16.0` and transformers to `<4.52.0`.
When using LoRA with `Conv2d` layers where `groups != 1`, ensure the rank `r` is divisible by `groups`.

✨ New Features

Added the LoRA-FA optimizer for increased memory efficiency during LoRA training.
Introduced RandLoRA, a new PEFT method using non-learnable random low rank matrices combined with learnable matrices to approximate full rank updates.
Added Circular Convolution Adaptation (C3A), a new PEFT method designed to overcome low rank adaptation limits while remaining fast and memory efficient.
LoRA now supports `Conv2d` layers where `groups != 1`, provided the rank `r` is divisible by `groups`.
Added support for Intel Neural Compressor (INC) quantization to LoRA.
DoRA now supports `Conv1d` layers.
Passing `init_lora_weights="orthogonal"` now enables orthogonal weight initialization for LoRA.
Introduced Quantization-Aware LoRA (QALoRA) training for more efficient QLoRA training (currently only supports GPTQ).

🐛 Bug Fixes

Fixed issues with multiple PEFT methods when models are loaded in float16 or bfloat16.
Fixed deletion of adapters on auxiliary modules.
Fixed error when merging LoRA bias with scale != 1.
Fixed X-LoRA error when targeting different modules.
Fixed regression accessing `modules_to_save`.
Fixed faulty test that resulted in nan weights.
Fixed issues related to merging LoRA bias with scale != 1.

🔧 Affected Symbols

LoRADoRAOFTPrefix Tuninginject_adaptervlm.language_model

⚡ Deprecations

The use of `evaluation_strategy` is deprecated (related to #2487).