Change8

v1.3.0

Breaking Changes
📦 accelerateView on GitHub →
1 breaking2 features🐛 10 fixes🔧 6 symbols

Summary

This release enforces PyTorch 2.0 as the minimum required version and introduces improvements for handling compiled models, TPU execution, and various bug fixes across device support and offloading.

⚠️ Breaking Changes

  • Accelerate now requires PyTorch 2.0 or higher as the minimum version. Users on older PyTorch versions must upgrade to PyTorch 2.0+ to use this release.

✨ New Features

  • Added `keep_torch_compile` parameter to `unwrap_model` and `extract_model_from_parallel` functions to better handle distributed compiled models.
  • Added an example demonstrating how to handle gradient accumulation with cross-entropy loss.

🐛 Bug Fixes

  • Fixed an issue with `load_state_dict` when running on NPU devices.
  • Resolved an issue where the latest `bnb` library caused an error because the `optim_args` attribute was missing on the optimizer.
  • Added a version check for `torchdata` to prevent an "in_order" error during dataloading.
  • Fixed dataloader logic to check if `in_order` exists in kwargs before attempting to drop it.
  • Removed `nprocs` argument from `xla.spawn` for TPU execution.
  • Fixed an issue related to offloading when using TorchAO version >= 0.7.0.
  • Fixed tests related to offload generation.
  • Ensured that tied parameters are correctly identified as children of the module.
  • Corrected the return statement in `_init_infer_auto_device_map`.
  • Updated XPU memory information retrieval to use `torch.xpu.mem_get_info`.

🔧 Affected Symbols

unwrap_modelextract_model_from_parallelload_state_dictxla.spawn_init_infer_auto_device_maptorch.xpu.mem_get_info