Change8

v0.15.0

📦 peftView on GitHub →
6 features🐛 12 fixes1 deprecations🔧 8 symbols

Summary

This release introduces significant new features including CorDA initialization for LoRA and the Trainable Tokens tuner, alongside enhancements to LoRA targeting and Hotswapping capabilities. It also deprecates PEFT_TYPE_TO_MODEL_MAPPING and replaces AutoGPTQ support with GPTQModel.

Migration Steps

  1. If relying on PEFT_TYPE_TO_MODEL_MAPPING, update code to use PEFT_TYPE_TO_TUNER_MAPPING instead.
  2. If using LoRA with multihead attention modules, ensure they meet the condition (_qkv_same_embed_dim=True) for now.
  3. If using Hotswapping with dynamic rank/alpha changes, call prepare_model_for_compiled_hotswap() before compiling the model.
  4. If using AutoGPTQ, migrate to using GPTQModel.
  5. To use pattern matching for rank_pattern/alpha_pattern to target full paths, prefix the pattern with a caret (^), e.g., '^foo'.

✨ New Features

  • Introduced CorDA (Context-Oriented Decomposition Adaptation) initialization method for LoRA, supporting knowledge-preservation and instruction-preservation modes.
  • Added Trainable Tokens tuner for selective training of tokens, offering memory efficiency and smaller checkpoints, usable standalone or with LoRA.
  • LoRA now supports targeting multihead attention modules (only those with _qkv_same_embed_dim=True).
  • Hotswapping now supports different alpha scalings and ranks without recompilation if the model is prepared via prepare_model_for_compiled_hotswap().
  • Added support for GPTQModel as a replacement for the unmaintained AutoGPTQ.
  • The 'all-linear' option for target_modules now works for custom (non-transformers) models.

🐛 Bug Fixes

  • Fixed a bug where non-linear layers could be selected when using 'all-linear' if they shared a name substring with a linear layer.
  • Fixed an issue where modules_to_save keys could wrongly match parts of the state dict if the key was a substring of another key (e.g., 'classifier' matching 'classifier2').
  • Fixed device compatibility issues for BOFT forward/merging.
  • Added warning for adapter_name conflict with tuner.
  • Fixed adoption prompt errors following changes in transformers #35235.
  • Fixed low_cpu_mem_usage=True compatibility with 8bit bitsandbytes.
  • Fixed memory consumption for CorDA and improved related documentation.
  • Fixed generating with mixed adapter batches and beam search enabled.
  • Avoided needless copy from modules_to_save.
  • Fixed Prefix tuning tests with rotary embedding on multi-GPU.
  • Fixed package checks for torchao and EETQ.
  • Fixed missing attributes in MultiheadAttention.

🔧 Affected Symbols

LoraConfigprepare_model_for_compiled_hotswapGPTQModelAutoGPTQPEFT_TYPE_TO_MODEL_MAPPINGPEFT_TYPE_TO_TUNER_MAPPINGmodules_to_saveMultiheadAttention

⚡ Deprecations

  • PEFT_TYPE_TO_MODEL_MAPPING is deprecated and should be replaced by PEFT_TYPE_TO_TUNER_MAPPING.