Change8

v1.12.0

📦 accelerateView on GitHub →
2 features🐛 2 fixes🔧 4 symbols

Summary

This release introduces major integration with Deepspeed Ulysses/ALST for sequence parallelism, enabling efficient long sequence training. It also includes several minor fixes and documentation updates.

Migration Steps

  1. To enable Deepspeed Ulysses, create a `ParallelismConfig` and set `sp_backend="deepspeed"`, `sp_size`, and configure `sp_handler=DeepSpeedSequenceParallelConfig(...)`.
  2. When using Deepspeed Ulysses, ensure the loss computation correctly aggregates losses across ranks using `torch.distributed.nn.functional.all_gather` as described in the documentation.

✨ New Features

  • Integration with Deepspeed Ulysses/ALST for efficient training on long sequences using sequence parallelism and attention head parallelism.
  • Added support for enabling Deepspeed Ulysses via `ParallelismConfig` by setting `sp_backend="deepspeed"`, `sp_size`, and providing `DeepSpeedSequenceParallelConfig`.

🐛 Bug Fixes

  • Removed warning for `cpu_ram_efficient_loading`.
  • Fixed an issue where `torch.optim.Optimizer` parameter states were not updated correctly after tensor parallelism.

🔧 Affected Symbols

ParallelismConfigDeepSpeedSequenceParallelConfigtorch.distributed.nn.functional.all_gathertorch.optim.Optimizer