Change8

PyTorch

Data & ML

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Latest: v2.12.09 releases9 breaking changes21 common errorsView on GitHub

Release History

v2.12.0Breaking6 features
May 13, 2026

PyTorch 2.12 introduces significant performance improvements, notably in batched linalg.eigh on CUDA and fused Adagrad optimization. This release also enforces stricter build requirements, including C++20 and CUDA 12.6 for source builds, and updates distributed functional API usage within torch.compile.

v2.11.0Breaking5 features
Mar 23, 2026

PyTorch 2.11 introduces major highlights like Differentiable Collectives and FlexAttention updates, but enforces breaking changes by moving PyPI wheels to CUDA 13.0 and modifying APIs for variable length attention and hub loading.

v2.10.0Breaking14 features
Jan 21, 2026

PyTorch 2.10 introduces Python 3.14 support for torch.compile, new features like combo-kernels fusion and LocalTensor for distributed debugging, and removes several deprecated or legacy functionalities across ONNX, Dataloader, and nn modules.

v2.9.1Breaking12 fixes3 features
Nov 12, 2025

This maintenance release addresses critical regressions in PyTorch 2.9.0, specifically fixing memory issues in 3D convolutions, Inductor compilation bugs for Gemma/vLLM, and various distributed and numeric stability fixes.

v2.9.0Breaking1 fix7 features
Oct 15, 2025

PyTorch 2.9.0 introduces Python 3.10 as the minimum requirement, defaults the ONNX exporter to the Dynamo-based pipeline, and adds support for symmetric memory and FlexAttention on new hardware.

v2.8.0Breaking3 fixes10 features
Aug 6, 2025

PyTorch 2.8.0 introduces high-performance quantized LLM inference on Intel CPUs, SYCL support for CPP extensions, and stricter validation for autograd and torch.compile. It includes significant breaking changes regarding CUDA architecture support and internal configuration renames.

v2.7.1Breaking16 fixes3 features
Jun 4, 2025

This maintenance release focuses on fixing regressions and silent correctness issues across torch.compile, Distributed, and Flex Attention, while also improving wheel sizes and platform-specific compatibility for MacOS, Windows, and XPU.

v2.7.0Breaking1 fix9 features
Apr 23, 2025

PyTorch 2.7.0 introduces Blackwell support and FlexAttention optimizations while enforcing stricter C++ API visibility and Python limited API compliance. It marks a significant shift in ONNX and Export workflows by deprecating legacy capture methods in favor of the unified torch.export API.

v2.6.0Breaking10 features
Jan 29, 2025

PyTorch 2.6 introduces Python 3.13 support for torch.compile, FP16 support for X86 CPUs, and new AOTInductor packaging APIs. It includes a significant security change making torch.load use weights_only=True by default and deprecates the official Anaconda channel.

Common Errors

AssertionError4 reports

AssertionError in PyTorch Inductor often arises from stride mismatches during tensor layout optimization, particularly in convolutional operations. This happens when the predicted strides of a tensor after a series of operations don't align with the actual strides. To fix it, carefully inspect the stride calculations within Inductor's layout propagation rules, ensuring they accurately reflect the effects of operations like convolution, transpose, and reshaping; manually override the predicted layout if necessary using `Layout.set_stride` or disabling targeted layout optimizations with `torch._dynamo.config.force_layout_optimization = False`.

NotImplementedError3 reports

torch.NotImplementedError usually arises when a requested function or operation lacks a specific implementation for the given data type, device (CPU/GPU), or configuration. To resolve it, either implement the missing functionality for the specific case (e.g., register a kernel for a specific dtype), use the function with supported data types/devices, or choose an alternative implementation or workaround that achieves the same result within PyTorch's supported functionalities. Ensure the documentation reflects any limitations or constraints to prevent unexpected errors.

InternalTorchDynamoError3 reports

InternalTorchDynamoError often arises from unexpected errors during graph compilation or execution within TorchDynamo, such as unsupported operations or incorrect assumptions about tensor shapes. To fix this, carefully examine the traceback for the root cause, often indicating a specific line of code or operator triggering the error. Then, either rewrite the code to use Dynamo-compatible operations, add guards to avoid problematic code paths during compilation, or disable compilation for that section using `torch._dynamo.disable` as a last resort while reporting the issue.

ProcessRaisedException3 reports

ProcessRaisedException in PyTorch often arises from issues within multiprocessing contexts, specifically related to CUDA device handling or argument mismatches during distributed operations or within TorchInductor. Ensure CUDA devices are correctly initialized and visible to all processes, and verify that all function/class calls within multiprocessing conform to the expected argument count and types as defined by PyTorch or TorchInductor APIs, paying special attention to distributed configurations.

RuntimeError3 reports

RuntimeError in PyTorch often arises when operations encounter invalid input values or conditions during execution, particularly in compiled or optimized code paths like those generated by `torch.compile`. To fix this, add explicit input validation checks (e.g., using `torch.clamp` or assertions) before the problematic operation to ensure data falls within the expected range or satisfies required conditions. If the error occurs due to fake tensor propagation, ensure that all necessary operators are defined for the custom tensor type when tracing.

OutOfMemoryError3 reports

OutOfMemoryError in PyTorch usually stems from allocating more GPU memory than available. Fix this by reducing batch size, model size, or sequence length, and explicitly release unused tensors with `del` and `torch.cuda.empty_cache()` to free up memory. Consider using gradient accumulation or mixed-precision training (e.g., with `torch.cuda.amp`) to further lower memory footprint.

Related Data & ML Packages

Subscribe to Updates

Get notified when new versions are released

RSS Feed