v1.12.0
📦 accelerateView on GitHub →
✨ 2 features🐛 2 fixes🔧 4 symbols
Summary
This release introduces major integration with Deepspeed Ulysses/ALST for sequence parallelism, enabling efficient long sequence training. It also includes several minor fixes and documentation updates.
Migration Steps
- To enable Deepspeed Ulysses, create a `ParallelismConfig` and set `sp_backend="deepspeed"`, `sp_size`, and configure `sp_handler=DeepSpeedSequenceParallelConfig(...)`.
- When using Deepspeed Ulysses, ensure the loss computation correctly aggregates losses across ranks using `torch.distributed.nn.functional.all_gather` as described in the documentation.
✨ New Features
- Integration with Deepspeed Ulysses/ALST for efficient training on long sequences using sequence parallelism and attention head parallelism.
- Added support for enabling Deepspeed Ulysses via `ParallelismConfig` by setting `sp_backend="deepspeed"`, `sp_size`, and providing `DeepSpeedSequenceParallelConfig`.
🐛 Bug Fixes
- Removed warning for `cpu_ram_efficient_loading`.
- Fixed an issue where `torch.optim.Optimizer` parameter states were not updated correctly after tensor parallelism.
🔧 Affected Symbols
ParallelismConfigDeepSpeedSequenceParallelConfigtorch.distributed.nn.functional.all_gathertorch.optim.Optimizer