v2.11.0
Breaking Changes📦 pytorchView on GitHub →
⚠ 8 breaking✨ 5 features🔧 6 symbols
Summary
PyTorch 2.11 introduces major highlights like Differentiable Collectives and FlexAttention updates, but enforces breaking changes by moving PyPI wheels to CUDA 13.0 and modifying APIs for variable length attention and hub loading.
⚠️ Breaking Changes
- Volta (SM 7.0) GPU support has been removed from the CUDA 12.8 and 12.9 pre-built binaries due to incompatibility with CuDNN 9.15.1. Users with Volta GPUs needing CUDA 12.8+ should use CUDA 12.6 builds or build from source specifying Volta support.
- PyPI wheels now ship with CUDA 13.0 by default. Users with only CUDA 12.x drivers might encounter errors unless they specify an index URL (e.g., for cu128 or cu126 builds).
- CUDA 13.0 wheels no longer support Maxwell and Pascal GPUs on Linux x86_64; use CUDA 12.6 builds for these architectures.
- The default value for `trust_repo` in `torch.hub.list()`, `torch.hub.load()`, and `torch.hub.help()` changed from `None` to `"check"`. Code explicitly relying on `trust_repo=None` must be updated to use `trust_repo=True` to skip prompts or `trust_repo="check"` to maintain the new default behavior.
- The signature of `torch.nn.attention.varlen_attn` changed: optional arguments (`is_causal`, `return_aux`, `scale`) must now be passed as keyword arguments due to the insertion of a `*` separator. A new `window_size` keyword argument was added.
- The `is_causal` parameter has been removed from `torch.nn.attention.varlen_attn`. Causal masking must now be specified using `window_size=(-1, 0)`.
- The C++ `DebugInfoWriter` now honors `$XDG_CACHE_HOME` for its cache directory, defaulting to `~/.cache/torch` only if `$XDG_CACHE_HOME` is unset. This may change where debug info is written compared to PyTorch 2.10.
- Code that skips `dist.init_process_group`, constructs a `DeviceMesh`, and then creates process groups separately may break when using `torch.compile`. Process groups must now exist before constructing the `DeviceMesh` because `DeviceMesh` stores the process group registry internally for tracing.
Migration Steps
- If using Volta GPUs (SM 7.0) with CUDA 12.8/12.9, switch to using CUDA 12.6 builds: `pip install torch --index-url https://download.pytorch.org/whl/cu126`.
- If you encounter errors installing PyTorch via `pip install torch` on systems with older CUDA drivers, explicitly specify the index URL for the desired CUDA version (e.g., `cu128` or `cu126`).
- If using Maxwell or Pascal GPUs on Linux x86_64 with PyPI wheels, switch to CUDA 12.6 builds: `pip install torch --index-url https://download.pytorch.org/whl/cu126`.
- Update calls to `torch.hub.load()`, `torch.hub.list()`, or `torch.hub.help()` that previously relied on the default behavior or explicitly used `trust_repo=None` to explicitly set `trust_repo=True` or `trust_repo="check"`.
- Update calls to `torch.nn.attention.varlen_attn` to pass optional arguments like `return_aux` and `scale` as keyword arguments, and use `window_size` instead of `is_causal`.
- Replace calls using `is_causal=True` in `varlen_attn` with `window_size=(-1, 0)`.
- Ensure `dist.init_process_group` is called before constructing `torch.distributed.DeviceMesh` if using `torch.compile`.
✨ New Features
- Added support for Differentiable Collectives for Distributed Training.
- FlexAttention now includes a FlashAttention-4 backend for Hopper and Blackwell GPUs.
- Comprehensive Operator Expansion added for MPS (Apple Silicon).
- Added RNN/LSTM GPU Export Support.
- Added XPU Graph Support.