v1.12.0

📅 Nov 21, 2025📦 accelerateView on GitHub →

✨ 2 features🐛 2 fixes🔧 4 symbols

Summary

This release introduces major integration with Deepspeed Ulysses/ALST for sequence parallelism, enabling efficient long sequence training. It also includes several minor fixes and documentation updates.

Migration Steps

To enable Deepspeed Ulysses, create a `ParallelismConfig` and set `sp_backend="deepspeed"`, `sp_size`, and configure `sp_handler=DeepSpeedSequenceParallelConfig(...)`.
When using Deepspeed Ulysses, ensure the loss computation correctly aggregates losses across ranks using `torch.distributed.nn.functional.all_gather` as described in the documentation.

✨ New Features

Integration with Deepspeed Ulysses/ALST for efficient training on long sequences using sequence parallelism and attention head parallelism.
Added support for enabling Deepspeed Ulysses via `ParallelismConfig` by setting `sp_backend="deepspeed"`, `sp_size`, and providing `DeepSpeedSequenceParallelConfig`.

🐛 Bug Fixes

Removed warning for `cpu_ram_efficient_loading`.
Fixed an issue where `torch.optim.Optimizer` parameter states were not updated correctly after tensor parallelism.

🔧 Affected Symbols

ParallelismConfigDeepSpeedSequenceParallelConfigtorch.distributed.nn.functional.all_gathertorch.optim.Optimizer