PyTorch
Data & MLTensors and Dynamic neural networks in Python with strong GPU acceleration
Release History
v2.11.0Breaking5 featuresPyTorch 2.11 introduces major highlights like Differentiable Collectives and FlexAttention updates, but enforces breaking changes by moving PyPI wheels to CUDA 13.0 and modifying APIs for variable length attention and hub loading.
v2.10.0Breaking14 featuresPyTorch 2.10 introduces Python 3.14 support for torch.compile, new features like combo-kernels fusion and LocalTensor for distributed debugging, and removes several deprecated or legacy functionalities across ONNX, Dataloader, and nn modules.
v2.9.1Breaking12 fixes3 featuresThis maintenance release addresses critical regressions in PyTorch 2.9.0, specifically fixing memory issues in 3D convolutions, Inductor compilation bugs for Gemma/vLLM, and various distributed and numeric stability fixes.
v2.9.0Breaking1 fix7 featuresPyTorch 2.9.0 introduces Python 3.10 as the minimum requirement, defaults the ONNX exporter to the Dynamo-based pipeline, and adds support for symmetric memory and FlexAttention on new hardware.
v2.8.0Breaking3 fixes10 featuresPyTorch 2.8.0 introduces high-performance quantized LLM inference on Intel CPUs, SYCL support for CPP extensions, and stricter validation for autograd and torch.compile. It includes significant breaking changes regarding CUDA architecture support and internal configuration renames.
v2.7.1Breaking16 fixes3 featuresThis maintenance release focuses on fixing regressions and silent correctness issues across torch.compile, Distributed, and Flex Attention, while also improving wheel sizes and platform-specific compatibility for MacOS, Windows, and XPU.
v2.7.0Breaking1 fix9 featuresPyTorch 2.7.0 introduces Blackwell support and FlexAttention optimizations while enforcing stricter C++ API visibility and Python limited API compliance. It marks a significant shift in ONNX and Export workflows by deprecating legacy capture methods in favor of the unified torch.export API.
v2.6.0Breaking10 featuresPyTorch 2.6 introduces Python 3.13 support for torch.compile, FP16 support for X86 CPUs, and new AOTInductor packaging APIs. It includes a significant security change making torch.load use weights_only=True by default and deprecates the official Anaconda channel.
Common Errors
TorchRuntimeError12 reportsTorchRuntimeError in PyTorch often arises from incorrect tensor datatypes within operations or when moving tensors between CPU and GPU without proper type casting (e.g., .float(), .long(), .cuda()). Resolve this by ensuring all tensors involved in an operation have compatible datatypes and are on the same device. Explicitly cast tensors and use .to(device) before the operation to guarantee correct datatype and device placement.
OpCheckError4 reportsOpCheckError in PyTorch custom operator testing usually means the custom operator's output does not match the expected output calculated by NumPy, violating the OpInfo contract. Debug by carefully inspecting the custom operator's forward and backward implementations, paying close attention to data types, tensor shapes, and numerical precision to ensure they align with NumPy's behavior. If discrepancies are found, rectify the custom operator to produce outputs consistent with NumPy when given the same inputs.
OutOfMemoryError3 reportsOutOfMemoryError in PyTorch usually stems from allocating more GPU memory than available. Fix this by reducing batch size, model size, or sequence length, and explicitly release unused tensors with `del` and `torch.cuda.empty_cache()` to free up memory. Consider using gradient accumulation or mixed-precision training (e.g., with `torch.cuda.amp`) to further lower memory footprint.
ProcessRaisedException3 reportsProcessRaisedException in PyTorch often arises from issues within multiprocessing contexts, specifically related to CUDA device handling or argument mismatches during distributed operations or within TorchInductor. Ensure CUDA devices are correctly initialized and visible to all processes, and verify that all function/class calls within multiprocessing conform to the expected argument count and types as defined by PyTorch or TorchInductor APIs, paying special attention to distributed configurations.
RefResolutionError2 reportsRefResolutionError in PyTorch FX usually arises when tracing a module and the tracer encounters a symbol (e.g., a function or module) it cannot resolve within the FX graph's scope. To fix this, either make sure all necessary modules/functions are explicitly passed as arguments or attributes of the module being traced, or use `torch.fx.wrap` to expose external functions/modules to the tracer. Consider tracing at a higher level or restructuring your code to improve traceability if the problem persists.
NoValidChoicesError2 reportsThe "NoValidChoicesError" in PyTorch Inductor usually indicates that no viable backend implementations (e.g., GEMM, convolution) are found that satisfy all constraints for a given operation, often due to unsupported data types, shapes, or hardware features. To fix this, either rewrite the operation using supported data types/shapes/layouts, or investigate and potentially enable/implement a missing backend implementation in Inductor that fulfills the requirements (often requires understanding Inductor's code generation).
Related Data & ML Packages
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
scikit-learn: machine learning in Python
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Streamlit — A faster way to build and share data apps.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Subscribe to Updates
Get notified when new versions are released