Change8

v1.4.0

📦 accelerateView on GitHub →
2 features🐛 4 fixes🔧 5 symbols

Summary

This release introduces initial support for FP8 training via the `torchao` backend and adds initial Tensor Parallelism support for dataloaders, alongside several bug fixes including a critical memory leak resolution.

Migration Steps

  1. To use the new FP8 support, pass `AORecipeKwargs` to the `Accelerator` constructor and set `mixed_precision="fp8"`.

✨ New Features

  • Introduced initial FP8 API support via the new `torchao` backend. Use by passing `AORecipeKwargs` to the `Accelerator` while setting `mixed_precision="fp8"`.
  • Initial support for Tensor Parallelism (TP) when using accelerate dataloaders.

🐛 Bug Fixes

  • Fixed triton version check.
  • Fixed `torch_dtype` usage in memory estimation.
  • Fixed FP8 compatibility when using DeepSpeed.
  • Resolved a memory leak by replacing `GradientState -> DataLoader` references with weakrefs.

🔧 Affected Symbols

AcceleratorAORecipeKwargsGradientStateDataLoaderget_quantized_model_device_map