Change8

PyTorch Lightning

Data & ML

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Latest: 2.6.19 releases1 breaking changes8 common errorsView on GitHub

Release History

2.6.1Breaking9 fixes3 features
Jan 30, 2026

This patch introduces method chaining for freezing/unfreezing modules and adds litlogger integration. It also removes support for Python 3.9 and fixes several bugs related to checkpointing, hyperparameter saving, and distributed sampling.

2.6.014 fixes7 features
Nov 28, 2025

Version 2.6.0 introduces several new features like WeightAveraging callbacks and Torch-Tensorrt integration, alongside numerous bug fixes across PyTorch Lightning and Fabric components.

2.5.61 feature
Nov 5, 2025

This release introduces a new `name()` function to the accelerator interface and removes support for the deprecated lightning-habana package.

2.5.56 fixes
Sep 5, 2025

This patch release for PyTorch Lightning and Lightning Fabric focuses on bug fixes, including issues with `LightningCLI`, `ModelCheckpoint` saving logic, and progress bar resetting. It also includes updates for PyTorch 2.8 compatibility.

2.5.45 fixes1 feature
Aug 29, 2025

This patch release for PyTorch Lightning focuses on bug fixes across checkpointing, callbacks, and strategy integrations. Lightning Fabric also added support for NVIDIA H200 GPUs.

2.5.313 fixes5 features
Aug 13, 2025

This release brings numerous bug fixes across PyTorch Lightning and Lightning Fabric, including improvements to checkpointing, logging, profiling, and progress bar rendering. New features include more flexibility in ModelCheckpoint options and handling of training_step returns.

2.5.28 fixes1 feature
Jun 20, 2025

This release introduces the `toggled_optimizer` context manager to LightningModule and resolves several bugs related to CLI integration, DDP synchronization, and checkpointing. Users are advised to update `fsspec` for cross-device checkpointing.

2.5.1.post0
Apr 25, 2025

This is a post-release update (2.5.1.post0) following version 2.5.1, with details available in the linked comparison.

2.5.110 fixes4 features
Mar 19, 2025

This release introduces enhancements for logging integrations like MLflow and CometML, allows customization of LightningCLI argument parsing, and fixes several bugs related to logging latency, checkpoint resumption, and logger behavior. Legacy support for `lightning run model` has been removed in favor of `fabric run`.

Common Errors

FileNotFoundError4 reports

FileNotFoundError in pytorch-lightning often arises when file paths used for saving checkpoints, configurations, or logs are invalid or the destination directory doesn't exist. To fix this, ensure all specified directories exist before writing to them, using `os.makedirs(path, exist_ok=True)` to create them if needed; also, validate that file paths are correctly formed, especially when dealing with absolute paths or paths with special characters on different operating systems. Consider using `os.path.join` for robust path construction.

NotImplementedError2 reports

The "NotImplementedError" in PyTorch Lightning usually arises when a required method (like `training_step`, `configure_optimizers`, or callbacks expecting specific hooks) is not defined in your LightningModule or Callback. Resolve this by ensuring you've overridden all necessary methods in your LightningModule or Callback classes with your custom logic, paying close attention to the expected inputs and outputs for each method as defined in the PyTorch Lightning documentation. If using callbacks, verify existing inherited methods.

ProcessExitedException2 reports

ProcessExitedException in PyTorch Lightning tests often arises from unexpected process termination during multi-processing scenarios such as `ddp_fork`. This commonly stems from resource exhaustion, unhandled exceptions within child processes, or conflicts with system-level libraries. To resolve it, ensure adequate system resources (RAM, CPU), implement robust error handling within the child processes, and check for library incompatibilities, especially with multiprocessing on MacOS.

OutOfMemoryError2 reports

OutOfMemoryError in PyTorch Lightning typically occurs when the GPU runs out of memory during training. Fix this by reducing the `batch_size` in your DataLoader, using gradient accumulation with `accumulate_grad_batches` in the Trainer, or offloading computations to the CPU using `torch.cuda.empty_cache()` periodically and enabling `model.half()` for mixed precision training if applicable. Consider larger GPUs or distributed training for further memory relief.

DistNetworkError1 report

DistNetworkError in PyTorch Lightning distributed tests often arises from address conflicts, specifically the EADDRINUSE error, indicating a port is already in use. To fix, specify available ports by setting the MASTER_PORT environment variable using os.environ["MASTER_PORT"] = str(find_free_port()), or configure the TCP store to find an open port automatically, mitigating address collisions during distributed initialization.

WandbAttachFailedError1 report

WandbAttachFailedError in pytorch-lightning often arises when Wandb is initialized outside of the main process in a distributed training setting, especially with TPUs, which interferes with proper experiment tracking. To fix this, ensure Wandb is only initialized within the main process (rank 0) by using a conditional check like `if self.trainer.global_rank == 0: wandb.init(...)` or utilize pytorch-lightning's built-in WandbLogger to handle this automatically.

Related Data & ML Packages

Subscribe to Updates

Get notified when new versions are released

RSS Feed