v1.3.0
Breaking Changes📦 accelerateView on GitHub →
⚠ 1 breaking✨ 2 features🐛 10 fixes🔧 6 symbols
Summary
This release enforces PyTorch 2.0 as the minimum required version and introduces improvements for handling compiled models, TPU execution, and various bug fixes across device support and offloading.
⚠️ Breaking Changes
- Accelerate now requires PyTorch 2.0 or higher as the minimum version. Users on older PyTorch versions must upgrade to PyTorch 2.0+ to use this release.
✨ New Features
- Added `keep_torch_compile` parameter to `unwrap_model` and `extract_model_from_parallel` functions to better handle distributed compiled models.
- Added an example demonstrating how to handle gradient accumulation with cross-entropy loss.
🐛 Bug Fixes
- Fixed an issue with `load_state_dict` when running on NPU devices.
- Resolved an issue where the latest `bnb` library caused an error because the `optim_args` attribute was missing on the optimizer.
- Added a version check for `torchdata` to prevent an "in_order" error during dataloading.
- Fixed dataloader logic to check if `in_order` exists in kwargs before attempting to drop it.
- Removed `nprocs` argument from `xla.spawn` for TPU execution.
- Fixed an issue related to offloading when using TorchAO version >= 0.7.0.
- Fixed tests related to offload generation.
- Ensured that tied parameters are correctly identified as children of the module.
- Corrected the return statement in `_init_infer_auto_device_map`.
- Updated XPU memory information retrieval to use `torch.xpu.mem_get_info`.
🔧 Affected Symbols
unwrap_modelextract_model_from_parallelload_state_dictxla.spawn_init_infer_auto_device_maptorch.xpu.mem_get_info