Change8

v0.19.0

📦 peftView on GitHub →
17 features🔧 1 symbols

Summary

This PEFT release introduces nine new parameter-efficient fine-tuning methods, including GraLoRA, BD-LoRA, and Cartridges, alongside numerous enhancements like LoRA conversion tools and improved support for Transformer Engine and Tensor Parallelism.

Migration Steps

  1. If you rely on weight tying between embedding and LM head layers, consider passing `ensure_weight_tying=True` to your PEFT config to ensure tying is upheld.

✨ New Features

  • Added GraLoRA method for granular low-rank adaptation by subdividing base weights into smaller blocks.
  • Added BD-LoRA method implementing block-diagonal LoRA weights to reduce communication overhead during tensor parallelism serving.
  • Added Cartridges method for training a prefix to compress long contexts into a short size.
  • Added PVeRA method, an extension of VeRA, that adds a probabilistic element by sampling from shared parameters.
  • Added PSOFT method for efficient orthogonal fine-tuning by constraining adaptation to a low-rank principal subspace.
  • Added Lily method featuring a sophisticated parameter sharing scheme where A parameters are shared blockwise and B parameters are globally shared via a router.
  • Added PEANuT method which adds small, neural net weight-aware neural tweakers to the base model, increasing expressivity or lowering parameter count.
  • Added TinyLoRA method allowing training of an extremely small number of parameters, particularly effective in reinforcement learning.
  • Added AdaMSS method which segments base weights into subspaces and allows dynamic parameter budget assignment to less important subspaces.
  • Added functionality to convert checkpoints of many non-LoRA methods into LoRA checkpoints.
  • Added LoRA-GA initialization method to align LoRA gradients with full fine-tuning for faster convergence.
  • Added utility function `reduce_intruder_dimension` to remove intruder dimensions in LoRA fine-tuned models to reduce forgetting.
  • Added support for NVIDIA's Transformer Engine quantization method.
  • Added support for Tensor Parallelism to LoRA.
  • Improved user experience for weight tying (e.g., embedding and LM head) by allowing users to pass `ensure_weight_tying=True` to the PEFT config.
  • Enabled LoRA to work with base models using very low precision floats like `torch.float8_e4m3fn`.
  • Introduced zero initialization for PrefixTuning.

Affected Symbols