'How do I predict using a trained Image to Emotion Pytorch model

We have found Artemis, a good project that predicts emotions based on art images. The model is available for download as a .pt file.

How can I use the model? I want to check them with some images to see if they're good.

I am loading the model with:

DEFAULT_MODEL_PATH = 'models/artemis/best_model.pt'

model = torch_load_model(DEFAULT_MODEL_PATH, 'cpu')
# c=model.eval()
# print(c)
 
image = Image.open("sample_images/neoclassicism_romantics_art_nouveau.jpg")

# this section to transform I don't know what it does 

trans = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.Resize(32),
    transforms.CenterCrop(32),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
input = trans(image)

input = input.view(1, 3, 32, 32)
output = model(input)
print(output)
prediction = int(torch.max(output.data, 1)[1].numpy())
print(prediction)

The above transform code I found on other SO answer, but being a newbie to ML I don't know what they do.

This code returns sometimes:

tensor([[-2.6041, -1.6840, -1.5014, -2.5007, -3.4555, -2.7803, -2.8510, -1.9448,
         -1.9579]], grad_fn=<LogSoftmaxBackward0>)
2

sometimes this:

tensor([[-2.8087, -1.4727, -1.5604, -2.5557, -3.7794, -2.9126, -2.6383, -1.8776,
         -2.1111]], grad_fn=<LogSoftmaxBackward0>)
1

Which I don't know what they are.
What I miss to run a prediction? I would love to see the labels and the predicted values.



Solution 1:[1]

output is the output of the neural network, which represents the probability of pre-defined classes. It seems that the total class number is 9 since the size of the output is 9.

Therefore, prediction is the index of the most likable class.

And also, the network's prediction could change due to the randomness of input transforms and itself. I recommend modifying your code like this:

DEFAULT_MODEL_PATH = 'models/artemis/best_model.pt'

model = torch_load_model(DEFAULT_MODEL_PATH, 'cpu')
model.eval()
# print(c)
 
image = Image.open("sample_images/neoclassicism_romantics_art_nouveau.jpg")

# this section to transform I don't know what it does 

trans = transforms.Compose([
#    transforms.RandomHorizontalFlip(),
    transforms.Resize(32),
#    transforms.CenterCrop(32),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
input = trans(image)

input = input.view(1, 3, 32, 32)
with torch.no_grad():
    output = model(input)
    prediction = int(torch.max(output.data, 1)[1].numpy())
print(output)
print(prediction)
  • model.eval() changes some operations (batchnorm, dropout, etc) in the network for inference
  • Discard transforms.RandomHorizontalFlip() for removing randomness
  • Discard transforms.CenterCrop(32) since it seems to do nothing since the image has resized to size of 32
  • with torch.no_grad() tells torch not to compute gradient

Any advise is welcomed

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1