PyTorch Lightning

Data & ML

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Latest: 2.6.511 releases2 breaking changes8 common errorsUpdated May 27, 2026View on GitHub

Release History

2.6.51 fix

May 27, 2026

This patch release for Lightning Fabric updates checkpoint uploads to S3/GCS by utilizing fs.pipe() in _atomic_save.

2.6.4Breaking5 fixes

May 20, 2026

This patch release removes deprecated Neptune logger support due to service sunsetting and includes several bug fixes related to FSDP mixed precision initialization, validation interval checks, and profiler precision.

2.6.1Breaking9 fixes3 features

Jan 30, 2026

This patch introduces method chaining for freezing/unfreezing modules and adds litlogger integration. It also removes support for Python 3.9 and fixes several bugs related to checkpointing, hyperparameter saving, and distributed sampling.

2.6.014 fixes7 features

Nov 28, 2025

Version 2.6.0 introduces several new features like WeightAveraging callbacks and Torch-Tensorrt integration, alongside numerous bug fixes across PyTorch Lightning and Fabric components.

2.5.61 feature

Nov 5, 2025

This release introduces a new `name()` function to the accelerator interface and removes support for the deprecated lightning-habana package.

2.5.56 fixes

Sep 5, 2025

This patch release for PyTorch Lightning and Lightning Fabric focuses on bug fixes, including issues with `LightningCLI`, `ModelCheckpoint` saving logic, and progress bar resetting. It also includes updates for PyTorch 2.8 compatibility.

2.5.45 fixes1 feature

Aug 29, 2025

This patch release for PyTorch Lightning focuses on bug fixes across checkpointing, callbacks, and strategy integrations. Lightning Fabric also added support for NVIDIA H200 GPUs.

2.5.313 fixes5 features

Aug 13, 2025

This release brings numerous bug fixes across PyTorch Lightning and Lightning Fabric, including improvements to checkpointing, logging, profiling, and progress bar rendering. New features include more flexibility in ModelCheckpoint options and handling of training_step returns.

2.5.28 fixes1 feature

Jun 20, 2025

This release introduces the `toggled_optimizer` context manager to LightningModule and resolves several bugs related to CLI integration, DDP synchronization, and checkpointing. Users are advised to update `fsspec` for cross-device checkpointing.

2.5.1.post0

Apr 25, 2025

This is a post-release update (2.5.1.post0) following version 2.5.1, with details available in the linked comparison.

2.5.110 fixes4 features

Mar 19, 2025

This release introduces enhancements for logging integrations like MLflow and CometML, allows customization of LightningCLI argument parsing, and fixes several bugs related to logging latency, checkpoint resumption, and logger behavior. Legacy support for `lightning run model` has been removed in favor of `fabric run`.

Common Errors

FileNotFoundError4 reports

FileNotFoundError in pytorch-lightning often arises when file paths used for saving checkpoints, configurations, or logs are invalid or the destination directory doesn't exist. To fix this, ensure all specified directories exist before writing to them, using `os.makedirs(path, exist_ok=True)` to create them if needed; also, validate that file paths are correctly formed, especially when dealing with absolute paths or paths with special characters on different operating systems. Consider using `os.path.join` for robust path construction.

NotImplementedError2 reports

The "NotImplementedError" in PyTorch Lightning usually arises when a required method (like `training_step`, `configure_optimizers`, or callbacks expecting specific hooks) is not defined in your LightningModule or Callback. Resolve this by ensuring you've overridden all necessary methods in your LightningModule or Callback classes with your custom logic, paying close attention to the expected inputs and outputs for each method as defined in the PyTorch Lightning documentation. If using callbacks, verify existing inherited methods.

ProcessExitedException2 reports

ProcessExitedException in PyTorch Lightning tests often arises from unexpected process termination during multi-processing scenarios such as `ddp_fork`. This commonly stems from resource exhaustion, unhandled exceptions within child processes, or conflicts with system-level libraries. To resolve it, ensure adequate system resources (RAM, CPU), implement robust error handling within the child processes, and check for library incompatibilities, especially with multiprocessing on MacOS.

OutOfMemoryError2 reports

OutOfMemoryError in PyTorch Lightning typically occurs when the GPU runs out of memory during training. Fix this by reducing the `batch_size` in your DataLoader, using gradient accumulation with `accumulate_grad_batches` in the Trainer, or offloading computations to the CPU using `torch.cuda.empty_cache()` periodically and enabling `model.half()` for mixed precision training if applicable. Consider larger GPUs or distributed training for further memory relief.

DistNetworkError1 report

DistNetworkError in PyTorch Lightning distributed tests often arises from address conflicts, specifically the EADDRINUSE error, indicating a port is already in use. To fix, specify available ports by setting the MASTER_PORT environment variable using os.environ["MASTER_PORT"] = str(find_free_port()), or configure the TCP store to find an open port automatically, mitigating address collisions during distributed initialization.

WandbAttachFailedError1 report

WandbAttachFailedError in pytorch-lightning often arises when Wandb is initialized outside of the main process in a distributed training setting, especially with TPUs, which interferes with proper experiment tracking. To fix this, ensure Wandb is only initialized within the main process (rank 0) by using a conditional check like `if self.trainer.global_rank == 0: wandb.init(...)` or utilize pytorch-lightning's built-in WandbLogger to handle this automatically.

Related Data & ML Packages

TensorFlow

An Open Source Machine Learning Framework for Everyone

Transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

PyTorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

scikit-learn

scikit-learn: machine learning in Python

Pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Release History

Common Errors

Related Data & ML Packages

Subscribe to Updates