v1.3.0

Breaking Changes

📅 Jan 17, 2025📦 accelerateView on GitHub →

⚠ 1 breaking✨ 2 features🐛 10 fixes🔧 6 symbols

Summary

This release enforces PyTorch 2.0 as the minimum required version and introduces improvements for handling compiled models, TPU execution, and various bug fixes across device support and offloading.

⚠️ Breaking Changes

Accelerate now requires PyTorch 2.0 or higher as the minimum version. Users on older PyTorch versions must upgrade to PyTorch 2.0+ to use this release.

✨ New Features

Added `keep_torch_compile` parameter to `unwrap_model` and `extract_model_from_parallel` functions to better handle distributed compiled models.
Added an example demonstrating how to handle gradient accumulation with cross-entropy loss.

🐛 Bug Fixes

Fixed an issue with `load_state_dict` when running on NPU devices.
Resolved an issue where the latest `bnb` library caused an error because the `optim_args` attribute was missing on the optimizer.
Added a version check for `torchdata` to prevent an "in_order" error during dataloading.
Fixed dataloader logic to check if `in_order` exists in kwargs before attempting to drop it.
Removed `nprocs` argument from `xla.spawn` for TPU execution.
Fixed an issue related to offloading when using TorchAO version >= 0.7.0.
Fixed tests related to offload generation.
Ensured that tied parameters are correctly identified as children of the module.
Corrected the return statement in `_init_infer_auto_device_map`.
Updated XPU memory information retrieval to use `torch.xpu.mem_get_info`.

🔧 Affected Symbols

unwrap_modelextract_model_from_parallelload_state_dictxla.spawn_init_infer_auto_device_maptorch.xpu.mem_get_info