'How to format my image inputs in the right way to make predictions with PyTorch Lighnting Module?

So far, I have created my Classifier Class inherited from the PyTorch Lightning Module. The training and validation run smoothly taking batches from the training_loader and val_loader previously created. Now, I want to make predictions but I cannot figure out how to do it in the correct way. I do not know if I should do it using my loaders or if I should iterate through the images in my directory, in which case what transformations should I apply to have them in the right format so that my model can make predictions?

train_loader = DataLoader(train_ds, 
                          batch_size=32,
                          collate_fn=collator, 
                          num_workers=4, 
                          shuffle=True)  # mutually exclusive with sampler)
val_loader = DataLoader(val_ds, 
                        batch_size=32, 
                        collate_fn=collator, 
                        num_workers=4)

class ImageClassifier(pl.LightningModule):
    def __init__(self, num_classes=3, lr=1e-3, weight_decay=2e-4, start_finetuning_backbone_at_epoch=5):
        super().__init__()
        self.save_hyperparameters()
                
        self.backbone = models.resnet50(pretrained=True)  # resnet50
        # .fc.out_features if backbone == "resnet50"
        # .classifier[-1].out_features if backbone == "vgg16"
        self.finetune_layer = torch.nn.Linear(self.backbone.fc.out_features, self.hparams.num_classes)
        
    def forward(self, x):
        # use forward for inference/predictions
        # Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7]
        with torch.no_grad():
            features = self.backbone(x)
            preds = self.finetune_layer(features)
        return preds

    def training_step(self, batch):
        # return the loss given a batch: this has a computational graph attached to it: optimization
        x = batch["pixel_values"]
        y = batch["labels"]
        if self.trainer.current_epoch < self.hparams.start_finetuning_backbone_at_epoch:
            with torch.no_grad():
                features = self.backbone(x)
        else:
            features = self.backbone(x)
        preds = self.finetune_layer(features)
        loss = cross_entropy(preds, y)
        self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)  # lightning detaches your loss graph and uses its value
        self.log('train_acc', accuracy(preds, y))
        return loss
...

Right now, I am trying to make predictions using the below, though I am not sure how I should format "x":

import torch
from torchvision import transforms
from PIL import Image

img_dir = "my/image/path"
img = Image.open(img_dir)
convert_tensor = transforms.ToTensor()
x = convert_tensor(img)

model = ImageClassifier()
model(x)

When I run that code, I get the below error message:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 669, 503] instead

If you could give some guidance on best practices as regard to this, I would be super grateful!



Solution 1:[1]

I have found my answer, so if it can be useful to anyone, I will post it here:

from torch.autograd import Variable

import torch
from torchvision import transforms
from PIL import Image


def image_loader(img_path):
    """load image, returns cuda tensor"""
    imsize = 256
    loader = transforms.Compose([transforms.Resize(imsize), transforms.ToTensor()])
    image = Image.open(img_path)
    image = loader(image).float()
    image = Variable(image, requires_grad=True)
    image = image.unsqueeze(0)  #this is for VGG, may not be needed for ResNet
    return image  # .cuda()  #assumes that you're using GPU

ckpt_path = "lightning_logs/epoch=2-step=125.ckpt"  #epoch=9-step=419.ckpt"
model = ImageClassifier.load_from_checkpoint(ckpt_path)
model.eval()
model.freeze()
# calls .forward method
# model(x)

def predict_from_img_path(img_path):
    image = image_loader(img_path)
    probs = model(image)
    pred = np.argmax(probs).item()
    pred_label = id2label[str(pred)]
    return pred_label

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Quentin Bracq