PyTorch Lightning with TensorBoard

Last Updated : 24 Sep, 2024

Pytorch-Lightning is a popular deep learning framework. It basically works with PyTorch models to simplify the training and testing of the models. This library is useful for distributed training as one can train the model seamlessly without much complex codes. Now to get the metrics in an user interactive, we need TensorBoard. TensorBoard is a powerful library that provides visualizations, loss and accuracy of the model. It also helps to debug our models.

Integrating PyTorch Lightning with TensorBoard, a powerful visualization tool, enhances the ability to monitor metrics, model performance, and training progress in real time.

Table of Content

Setting Up TensorBoard with PyTorch Lightning

To setup PyTorch Lightning with TensorBoard, we have to ensure that PyTorch has been installed. To install the library use the below command. We can also use pip command.

conda install pytorch torchvision torchaudio cpuonly -c pytorch

After installing PyTorch, we need to install PyTorch Lightning. To install the library use pip command:

pip install pytorch-lightning

Screenshot-2024-09-21-175118 — Setting Up TensorBoard with PyTorch Lightning

Now to install the TensorBoard Library use the below command:

pip install tensorboard

Screenshot-2024-09-21-194706 — Setting Up TensorBoard with PyTorch Lightning

Why Use PyTorch Lightning with TensorBoard?

Using PyTorch Lightning and TensorBoard together has multiple benefits:

Automated Logging: PyTorch Lightning automatically logs metrics, making it easier to monitor the training process.
Visualization: TensorBoard visualizes training progress, making debugging and analysis more efficient.
Scalability: PyTorch Lightning scales models across multiple GPUs and TPUs, while TensorBoard keeps track of metrics across these distributed systems.

Logging Metrics with PyTorch Lightning to TensorBoard

TensorBoard works hand in hand with Pytorch-Lightning. Whatever errors we log in using PyTorch Lightning, TensorBoard automatically captures the data, creates interactive visualizations and hosts them on local host. To store the results we use self.log method. This method interacts with TensorBoard and provides with the logs.

Python

x, y = batch y_hat = self(x) loss = self.loss_fn(y_hat, y) acc = (y_hat.argmax(dim=1) == y).float().mean() self.log('test_loss', loss, prog_bar=True) self.log('test_acc', acc, prog_bar=True)

self.log method is applicable for training, testing and validation methods.

After running the model, we need to log in to the TensorBoard and get the details of the metrics and its corresponding visualizations.
The results are usually hosted on http://localhost:6006/. TensorBoard is useful as it also helps in comparison of many versions of models.

The command to log in to TensorBoard is as follows:

tensorboard --logdir=lightning_logs/

Screenshot-2024-09-21-202855 — Logging Metrics with PyTorch Lightning

Screenshot-2024-09-21-203044 — Sample - Visualizing Model Training in TensorBoard

Example: Neural Network with PyTorch Lightning and TensorBoard

Here we have used MNIST dataset. We have defined the class using Pytorch-Lightning.

In the class there are three fully connected layers and methods like forward pass, optimizing, training steps to train model, validating steps to prevent model from overfitting and testing the model.
On the dataset, we apply transformations like converting to tensors, normalizations etc. Finally we call the Trainer object to train and test the model.

Batch Size:32
Epochs:5
Learning Rate:0.001
Optimizer: Adam
Activation Function: ReLU
Loss: Cross Entropy

Python

import pytorch_lightning as pl import torch from torch import nn from torch.utils.data import DataLoader from torchvision import transforms, datasets  # Step 1: Define the LightningModule class LitModel(pl.LightningModule):     def __init__(self):         super().__init__()         self.layer_1 = nn.Linear(28 * 28, 128)         self.layer_2 = nn.Linear(128, 256)         self.layer_3 = nn.Linear(256, 10)         self.loss_fn = nn.CrossEntropyLoss()      def forward(self, x):         # Flatten the input (28x28 images to 784)         x = x.view(x.size(0), -1)         x = torch.relu(self.layer_1(x))         x = torch.relu(self.layer_2(x))         x = self.layer_3(x)         return x      def training_step(self, batch, batch_idx):         x, y = batch         y_hat = self(x)         loss = self.loss_fn(y_hat, y)         acc = (y_hat.argmax(dim=1) == y).float().mean()  # Accuracy for training         self.log('train_loss', loss)         self.log('train_acc', acc)  # Logging training accuracy         return loss      def validation_step(self, batch, batch_idx):         x, y = batch         y_hat = self(x)         loss = self.loss_fn(y_hat, y)         acc = (y_hat.argmax(dim=1) == y).float().mean()  # Validation accuracy         self.log('val_loss', loss, prog_bar=True)         self.log('val_acc', acc, prog_bar=True)  # Logging validation accuracy      def test_step(self, batch, batch_idx):         x, y = batch         y_hat = self(x)         loss = self.loss_fn(y_hat, y)         acc = (y_hat.argmax(dim=1) == y).float().mean()  # Testing accuracy         self.log('test_loss', loss, prog_bar=True)         self.log('test_acc', acc, prog_bar=True)  # Logging test accuracy      def configure_optimizers(self):         return torch.optim.Adam(self.parameters(), lr=1e-3)  # Step 2: Prepare Data transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) mnist_train = datasets.MNIST(root='.', train=True, download=True, transform=transform) mnist_test = datasets.MNIST(root='.', train=False, download=True, transform=transform)  train_loader = DataLoader(mnist_train, batch_size=32) test_loader = DataLoader(mnist_test, batch_size=32)  # Step 3: Create Trainer and Train Model model = LitModel() trainer = pl.Trainer(max_epochs=5, accelerator='cpu')  # Step 4: Train the model trainer.fit(model, train_loader, test_loader)  # Step 5: Test the model trainer.test(model, test_loader)

Output:

Screenshot-2024-09-21-204454 — Neural Network on the MNIST Dataset with PyTorch Lightning

After the complete training and testing of the model use the command 'tensorboard --logdir=lightning_logs/' as all the logs are stored in lightning_logs. Now go to the host link as provided.

As we can see that the training loss decreases with the step size and the same goes for test as well. It also shows the relative time that has been taken to train the model. The overall testing accurcy of the model is 96.67%.

Conclusion

TensorBoard is a powerful library as it captures all the logs that has been generated during the training and testing of the models. By incorporating it with PyTorch Lightning model and simply using the self.log method, all the records gets captured and using the command all those logs are hosted on the local host. TensorBoard efficiently uses those logs to create interactive visualizations thereby reducing the code size.

Change view of Tensor in PyTorch

baidehi1874

Improve

Article Tags :