v1.4.0
📦 accelerateView on GitHub →
✨ 2 features🐛 4 fixes🔧 5 symbols
Summary
This release introduces initial support for FP8 training via the `torchao` backend and adds initial Tensor Parallelism support for dataloaders, alongside several bug fixes including a critical memory leak resolution.
Migration Steps
- To use the new FP8 support, pass `AORecipeKwargs` to the `Accelerator` constructor and set `mixed_precision="fp8"`.
✨ New Features
- Introduced initial FP8 API support via the new `torchao` backend. Use by passing `AORecipeKwargs` to the `Accelerator` while setting `mixed_precision="fp8"`.
- Initial support for Tensor Parallelism (TP) when using accelerate dataloaders.
🐛 Bug Fixes
- Fixed triton version check.
- Fixed `torch_dtype` usage in memory estimation.
- Fixed FP8 compatibility when using DeepSpeed.
- Resolved a memory leak by replacing `GradientState -> DataLoader` references with weakrefs.
🔧 Affected Symbols
AcceleratorAORecipeKwargsGradientStateDataLoaderget_quantized_model_device_map