v0.19.0
📦 peftView on GitHub →
✨ 17 features🔧 1 symbols
Summary
This PEFT release introduces nine new parameter-efficient fine-tuning methods, including GraLoRA, BD-LoRA, and Cartridges, alongside numerous enhancements like LoRA conversion tools and improved support for Transformer Engine and Tensor Parallelism.
Migration Steps
- If you rely on weight tying between embedding and LM head layers, consider passing `ensure_weight_tying=True` to your PEFT config to ensure tying is upheld.
✨ New Features
- Added GraLoRA method for granular low-rank adaptation by subdividing base weights into smaller blocks.
- Added BD-LoRA method implementing block-diagonal LoRA weights to reduce communication overhead during tensor parallelism serving.
- Added Cartridges method for training a prefix to compress long contexts into a short size.
- Added PVeRA method, an extension of VeRA, that adds a probabilistic element by sampling from shared parameters.
- Added PSOFT method for efficient orthogonal fine-tuning by constraining adaptation to a low-rank principal subspace.
- Added Lily method featuring a sophisticated parameter sharing scheme where A parameters are shared blockwise and B parameters are globally shared via a router.
- Added PEANuT method which adds small, neural net weight-aware neural tweakers to the base model, increasing expressivity or lowering parameter count.
- Added TinyLoRA method allowing training of an extremely small number of parameters, particularly effective in reinforcement learning.
- Added AdaMSS method which segments base weights into subspaces and allows dynamic parameter budget assignment to less important subspaces.
- Added functionality to convert checkpoints of many non-LoRA methods into LoRA checkpoints.
- Added LoRA-GA initialization method to align LoRA gradients with full fine-tuning for faster convergence.
- Added utility function `reduce_intruder_dimension` to remove intruder dimensions in LoRA fine-tuned models to reduce forgetting.
- Added support for NVIDIA's Transformer Engine quantization method.
- Added support for Tensor Parallelism to LoRA.
- Improved user experience for weight tying (e.g., embedding and LM head) by allowing users to pass `ensure_weight_tying=True` to the PEFT config.
- Enabled LoRA to work with base models using very low precision floats like `torch.float8_e4m3fn`.
- Introduced zero initialization for PrefixTuning.