'Flattening the input to nn.MSELoss()
Here's the screenshot of a YouTube video implementing the Loss function from the YOLOv1 original research paper. 
What I don't understand is the need for torch.Flatten() while passing the input to self.mse(), which, in fact, is nn.MSELoss()
The video just mentions the reason as nn.MSELoss() expects the input in the shape (a,b), which I specifically don't understand how or why?
Video link just in case. [For reference, N is the batch size, S is the grid size (split size)]
Solution 1:[1]
It helps to go back to the definitions. What is MSE? What is it computing?
MSE = mean squared error.
This will be rough pythonic pseudo code to illustrate.
total = 0
for (x,y) in (data,labels):
total += (x-y)**2
return total / len(labels) # the average squared difference
For each pair of entries it subtracts two numbers together and returns the average (or mean) after all of the subtractions.
To rephrase the question how would you interpret MSE without flattening? MSE as described and implemented doesn't mean anything for higher dimensions. You can use other loss functions if you want to work with the outputs being matrices such as norms of the output matrices.
Anyways hope that answers your question as to why the flattening is needed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Steven |
