'Understanding the role of the gradient parameter in torch.backward()

Firstly, say we have a scalar-valued function f in PyTorch (a linear, for example) that maps R^n -> R. So we get the following pseudocode:

a = torch.ones(n,)

b = f(a) # assume f is defined somewhere else in the code

b.backward()

The torch docs say that b.backward(torch.FloatTensor([1.])) is equivalent to the last backwards call. In this case, is the torch.FloatTensor([1.]) representing the derivative db/db? If not, what exactly is it representing and why is it 1?

Now change the situation to where f becomes a vector-valued function mapping from R^n -> R^m (a linear, for example). The PyTorch docs say that now we should be passing in a gradient parameter to the b.backward() call.

However, what "gradient" are they expecting? Do they expect the gradient of b=f(a) with respect to a (an m by n jacobian)?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source