v1.6.0
📦 accelerateView on GitHub →
✨ 5 features🐛 10 fixes🔧 5 symbols
Summary
This release introduces major features including FSDPv2 support and initial DeepSpeed Tensor Parallelism support, alongside adding the XCCL distributed backend for XPU devices.
Migration Steps
- If using Python code, set `fsdp_version=2` in `FullyShardedDataParallelPlugin`: `fsdp_plugin = FullyShardedDataParallelPlugin(fsdp_version=2)`.
- To convert a YAML config from FSDPv1 to FSDPv2, use the conversion tool: `accelerate to-fsdp2 --config_file config.yaml --output_file new_config.yaml`.
- To use TP with DeepSpeed, update the deepspeed config file by including the `tensor_parallel` key: `"tensor_parallel":{"autotp_size": ${autotp_size}}`.
✨ New Features
- Introduced support for FSDPv2 by setting `fsdp_version=2` in `FullyShardedDataParallelPlugin`.
- Added initial support for DeepSpeed + Tensor Parallelism (TP).
- Added support for XCCL distributed backend for XPU devices.
- Added `log_artifact`, `log_artifacts`, and `log_figure` capabilities to the MLflowTracker.
- Added `no_ssh` and `slurm multinode launcher options for deepspeed`.
🐛 Bug Fixes
- Fixed clipping gradient norm in FSDP2.
- Fixed an attribute issue with DeepSpeed TP.
- Fixed a typo in the multi-node FSDP slurm example script.
- Removed device index workaround on XPU as XPU now supports integer device index like CUDA.
- Enabled 2 UT cases on XPU.
- Fixed AMD GPU support with `should_reduce_batch_size()`.
- Fixed device KeyError in `tied_params_map`.
- Fixed seeding of new generator for multi GPU.
- Fixed `get_balanced_memory` for MPS.
- Fixed DeepSpeed dependency for MLU.
🔧 Affected Symbols
FullyShardedDataParallelPluginMLflowTrackershould_reduce_batch_size()tied_params_mapget_balanced_memory