'How to interpret the output format of a model?

Noob here, hard to elaborate my question without an example, so I use a model on the MNIST data that classifies digits based on number images.

# Load data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

Why does model end up with 64 (row) x 10 (column) matrix? I thought nn.Linear(64, 10) means a layer that has 64 input neurons to 10 neurons. Shouldn't it be an array of 10 probabilities?

and Why output activation function has dim=1 not dim=0? Isn't each row of 10 columns for an epoch? Shouldn't LogSoftmax being used to calculate the possibility of each digit?

I'm ...lost.

I have spent 2 hr on this, still can't find the answer, sorry for the noob question!

python pytorch

Solution 1:^[1]

We usually have our data in the form of (BATCH SIZE, INPUT SIZE) which here in your case would be (64, 784).

What this means is that in every batch you have 64 images and each image has 784 features.

Regarding your model this is what it outputs :

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))
print(model)

# Sequential(
#  (0): Linear(in_features=784, out_features=128, bias=True)
#  (1): ReLU()
#  (2): Linear(in_features=128, out_features=64, bias=True)
#  (3): ReLU()
#  (4): Linear(in_features=64, out_features=10, bias=True)
#  (5): LogSoftmax(dim=1)
# )

Let's go through how the data will flow through this model.

You have input of shape (64, 784)
It passes through first Linear layer, where each image of 784 features is converted to have 128 features so output is of shape (64, 128)
ReLU does not change the shape just the values so shape is again (64, 128)
Next Linear layer converts 128 features to 64 so now output shape is (64, 64).
Again ReLU layer just changes values so shape is still (64, 64)
This last Linear layer maps 64 input features to 10 output ones so shape is now (64, 10).
Lastly we have the LogSoftmax layer. Here we provided dim=1 because we want to calculate output possibility for each of the possible 10 digits for each of the 64 images in out batch. The dim=0 is the batch and dim=1 is the outputs for digits that is we we provide dim=1. After this your output will have shape (64, 10).

Therefore at the end, each image in the batch will have possibility for each of the 10 digits.

I thought nn.Linear(64, 10) means a layer that has 64 input neurons to 10 neurons.

That is correct. Another point to remember is that batch dimension is not specified in the layers of the models. We define layers to operate on each image. Your second last Linear layer output 64 values for an image so last Linear layer converts it to 10 values and then applies LogSoftmax to it.

This operation is simply repeated for all 64 images in a batch efficiently using matrix operations. You might be confusing your batch_size=64 with the input_features=64 of Linear layer which is entirely unrelated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Pragun

'How to interpret the output format of a model?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]