v1.6.0

📅 Apr 1, 2025📦 accelerateView on GitHub →

✨ 5 features🐛 10 fixes🔧 5 symbols

Summary

This release introduces major features including FSDPv2 support and initial DeepSpeed Tensor Parallelism support, alongside adding the XCCL distributed backend for XPU devices.

Migration Steps

If using Python code, set `fsdp_version=2` in `FullyShardedDataParallelPlugin`: `fsdp_plugin = FullyShardedDataParallelPlugin(fsdp_version=2)`.
To convert a YAML config from FSDPv1 to FSDPv2, use the conversion tool: `accelerate to-fsdp2 --config_file config.yaml --output_file new_config.yaml`.
To use TP with DeepSpeed, update the deepspeed config file by including the `tensor_parallel` key: `"tensor_parallel":{"autotp_size": ${autotp_size}}`.

✨ New Features

Introduced support for FSDPv2 by setting `fsdp_version=2` in `FullyShardedDataParallelPlugin`.
Added initial support for DeepSpeed + Tensor Parallelism (TP).
Added support for XCCL distributed backend for XPU devices.
Added `log_artifact`, `log_artifacts`, and `log_figure` capabilities to the MLflowTracker.
Added `no_ssh` and `slurm multinode launcher options for deepspeed`.

🐛 Bug Fixes

Fixed clipping gradient norm in FSDP2.
Fixed an attribute issue with DeepSpeed TP.
Fixed a typo in the multi-node FSDP slurm example script.
Removed device index workaround on XPU as XPU now supports integer device index like CUDA.
Enabled 2 UT cases on XPU.
Fixed AMD GPU support with `should_reduce_batch_size()`.
Fixed device KeyError in `tied_params_map`.
Fixed seeding of new generator for multi GPU.
Fixed `get_balanced_memory` for MPS.
Fixed DeepSpeed dependency for MLU.

🔧 Affected Symbols

FullyShardedDataParallelPluginMLflowTrackershould_reduce_batch_size()tied_params_mapget_balanced_memory