v1.4.0

📅 Feb 17, 2025📦 accelerateView on GitHub →

✨ 2 features🐛 4 fixes🔧 5 symbols

Summary

This release introduces initial support for FP8 training via the `torchao` backend and adds initial Tensor Parallelism support for dataloaders, alongside several bug fixes including a critical memory leak resolution.

Migration Steps

To use the new FP8 support, pass `AORecipeKwargs` to the `Accelerator` constructor and set `mixed_precision="fp8"`.

✨ New Features

Introduced initial FP8 API support via the new `torchao` backend. Use by passing `AORecipeKwargs` to the `Accelerator` while setting `mixed_precision="fp8"`.
Initial support for Tensor Parallelism (TP) when using accelerate dataloaders.

🐛 Bug Fixes

Fixed triton version check.
Fixed `torch_dtype` usage in memory estimation.
Fixed FP8 compatibility when using DeepSpeed.
Resolved a memory leak by replacing `GradientState -> DataLoader` references with weakrefs.

🔧 Affected Symbols

AcceleratorAORecipeKwargsGradientStateDataLoaderget_quantized_model_device_map