PyTorch
Data & MLTensors and Dynamic neural networks in Python with strong GPU acceleration
Release History
v2.12.0Breaking6 featuresPyTorch 2.12 introduces significant performance improvements, notably in batched linalg.eigh on CUDA and fused Adagrad optimization. This release also enforces stricter build requirements, including C++20 and CUDA 12.6 for source builds, and updates distributed functional API usage within torch.compile.
v2.11.0Breaking5 featuresPyTorch 2.11 introduces major highlights like Differentiable Collectives and FlexAttention updates, but enforces breaking changes by moving PyPI wheels to CUDA 13.0 and modifying APIs for variable length attention and hub loading.
v2.10.0Breaking14 featuresPyTorch 2.10 introduces Python 3.14 support for torch.compile, new features like combo-kernels fusion and LocalTensor for distributed debugging, and removes several deprecated or legacy functionalities across ONNX, Dataloader, and nn modules.
v2.9.1Breaking12 fixes3 featuresThis maintenance release addresses critical regressions in PyTorch 2.9.0, specifically fixing memory issues in 3D convolutions, Inductor compilation bugs for Gemma/vLLM, and various distributed and numeric stability fixes.
v2.9.0Breaking1 fix7 featuresPyTorch 2.9.0 introduces Python 3.10 as the minimum requirement, defaults the ONNX exporter to the Dynamo-based pipeline, and adds support for symmetric memory and FlexAttention on new hardware.
v2.8.0Breaking3 fixes10 featuresPyTorch 2.8.0 introduces high-performance quantized LLM inference on Intel CPUs, SYCL support for CPP extensions, and stricter validation for autograd and torch.compile. It includes significant breaking changes regarding CUDA architecture support and internal configuration renames.
v2.7.1Breaking16 fixes3 featuresThis maintenance release focuses on fixing regressions and silent correctness issues across torch.compile, Distributed, and Flex Attention, while also improving wheel sizes and platform-specific compatibility for MacOS, Windows, and XPU.
v2.7.0Breaking1 fix9 featuresPyTorch 2.7.0 introduces Blackwell support and FlexAttention optimizations while enforcing stricter C++ API visibility and Python limited API compliance. It marks a significant shift in ONNX and Export workflows by deprecating legacy capture methods in favor of the unified torch.export API.
v2.6.0Breaking10 featuresPyTorch 2.6 introduces Python 3.13 support for torch.compile, FP16 support for X86 CPUs, and new AOTInductor packaging APIs. It includes a significant security change making torch.load use weights_only=True by default and deprecates the official Anaconda channel.
Common Errors
AssertionError4 reportsAssertionError in PyTorch Inductor often arises from stride mismatches during tensor layout optimization, particularly in convolutional operations. This happens when the predicted strides of a tensor after a series of operations don't align with the actual strides. To fix it, carefully inspect the stride calculations within Inductor's layout propagation rules, ensuring they accurately reflect the effects of operations like convolution, transpose, and reshaping; manually override the predicted layout if necessary using `Layout.set_stride` or disabling targeted layout optimizations with `torch._dynamo.config.force_layout_optimization = False`.
NotImplementedError3 reportstorch.NotImplementedError usually arises when a requested function or operation lacks a specific implementation for the given data type, device (CPU/GPU), or configuration. To resolve it, either implement the missing functionality for the specific case (e.g., register a kernel for a specific dtype), use the function with supported data types/devices, or choose an alternative implementation or workaround that achieves the same result within PyTorch's supported functionalities. Ensure the documentation reflects any limitations or constraints to prevent unexpected errors.
InternalTorchDynamoError3 reportsInternalTorchDynamoError often arises from unexpected errors during graph compilation or execution within TorchDynamo, such as unsupported operations or incorrect assumptions about tensor shapes. To fix this, carefully examine the traceback for the root cause, often indicating a specific line of code or operator triggering the error. Then, either rewrite the code to use Dynamo-compatible operations, add guards to avoid problematic code paths during compilation, or disable compilation for that section using `torch._dynamo.disable` as a last resort while reporting the issue.
ProcessRaisedException3 reportsProcessRaisedException in PyTorch often arises from issues within multiprocessing contexts, specifically related to CUDA device handling or argument mismatches during distributed operations or within TorchInductor. Ensure CUDA devices are correctly initialized and visible to all processes, and verify that all function/class calls within multiprocessing conform to the expected argument count and types as defined by PyTorch or TorchInductor APIs, paying special attention to distributed configurations.
RuntimeError3 reportsRuntimeError in PyTorch often arises when operations encounter invalid input values or conditions during execution, particularly in compiled or optimized code paths like those generated by `torch.compile`. To fix this, add explicit input validation checks (e.g., using `torch.clamp` or assertions) before the problematic operation to ensure data falls within the expected range or satisfies required conditions. If the error occurs due to fake tensor propagation, ensure that all necessary operators are defined for the custom tensor type when tracing.
OutOfMemoryError3 reportsOutOfMemoryError in PyTorch usually stems from allocating more GPU memory than available. Fix this by reducing batch size, model size, or sequence length, and explicitly release unused tensors with `del` and `torch.cuda.empty_cache()` to free up memory. Consider using gradient accumulation or mixed-precision training (e.g., with `torch.cuda.amp`) to further lower memory footprint.
Related Data & ML Packages
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
scikit-learn: machine learning in Python
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Streamlit — A faster way to build and share data apps.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Subscribe to Updates
Get notified when new versions are released