'How to format my image inputs in the right way to make predictions with PyTorch Lighnting Module?
So far, I have created my Classifier Class inherited from the PyTorch Lightning Module. The training and validation run smoothly taking batches from the training_loader and val_loader previously created. Now, I want to make predictions but I cannot figure out how to do it in the correct way. I do not know if I should do it using my loaders or if I should iterate through the images in my directory, in which case what transformations should I apply to have them in the right format so that my model can make predictions?
train_loader = DataLoader(train_ds,
batch_size=32,
collate_fn=collator,
num_workers=4,
shuffle=True) # mutually exclusive with sampler)
val_loader = DataLoader(val_ds,
batch_size=32,
collate_fn=collator,
num_workers=4)
class ImageClassifier(pl.LightningModule):
def __init__(self, num_classes=3, lr=1e-3, weight_decay=2e-4, start_finetuning_backbone_at_epoch=5):
super().__init__()
self.save_hyperparameters()
self.backbone = models.resnet50(pretrained=True) # resnet50
# .fc.out_features if backbone == "resnet50"
# .classifier[-1].out_features if backbone == "vgg16"
self.finetune_layer = torch.nn.Linear(self.backbone.fc.out_features, self.hparams.num_classes)
def forward(self, x):
# use forward for inference/predictions
# Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7]
with torch.no_grad():
features = self.backbone(x)
preds = self.finetune_layer(features)
return preds
def training_step(self, batch):
# return the loss given a batch: this has a computational graph attached to it: optimization
x = batch["pixel_values"]
y = batch["labels"]
if self.trainer.current_epoch < self.hparams.start_finetuning_backbone_at_epoch:
with torch.no_grad():
features = self.backbone(x)
else:
features = self.backbone(x)
preds = self.finetune_layer(features)
loss = cross_entropy(preds, y)
self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True) # lightning detaches your loss graph and uses its value
self.log('train_acc', accuracy(preds, y))
return loss
...
Right now, I am trying to make predictions using the below, though I am not sure how I should format "x":
import torch
from torchvision import transforms
from PIL import Image
img_dir = "my/image/path"
img = Image.open(img_dir)
convert_tensor = transforms.ToTensor()
x = convert_tensor(img)
model = ImageClassifier()
model(x)
When I run that code, I get the below error message:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 669, 503] instead
If you could give some guidance on best practices as regard to this, I would be super grateful!
Solution 1:[1]
I have found my answer, so if it can be useful to anyone, I will post it here:
from torch.autograd import Variable
import torch
from torchvision import transforms
from PIL import Image
def image_loader(img_path):
"""load image, returns cuda tensor"""
imsize = 256
loader = transforms.Compose([transforms.Resize(imsize), transforms.ToTensor()])
image = Image.open(img_path)
image = loader(image).float()
image = Variable(image, requires_grad=True)
image = image.unsqueeze(0) #this is for VGG, may not be needed for ResNet
return image # .cuda() #assumes that you're using GPU
ckpt_path = "lightning_logs/epoch=2-step=125.ckpt" #epoch=9-step=419.ckpt"
model = ImageClassifier.load_from_checkpoint(ckpt_path)
model.eval()
model.freeze()
# calls .forward method
# model(x)
def predict_from_img_path(img_path):
image = image_loader(img_path)
probs = model(image)
pred = np.argmax(probs).item()
pred_label = id2label[str(pred)]
return pred_label
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Quentin Bracq |
