'Understanding the role of the gradient parameter in torch.backward()
Firstly, say we have a scalar-valued function f in PyTorch (a linear, for example) that maps R^n -> R. So we get the following pseudocode:
a = torch.ones(n,)
b = f(a) # assume f is defined somewhere else in the code
b.backward()
The torch docs say that b.backward(torch.FloatTensor([1.])) is equivalent to the last backwards call. In this case, is the torch.FloatTensor([1.]) representing the derivative db/db? If not, what exactly is it representing and why is it 1?
Now change the situation to where f becomes a vector-valued function mapping from R^n -> R^m (a linear, for example). The PyTorch docs say that now we should be passing in a gradient parameter to the b.backward() call.
However, what "gradient" are they expecting? Do they expect the gradient of b=f(a) with respect to a (an m by n jacobian)?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
