'How to perform Outer product of 2 vectors in Metal shaders?
So I'm developing a Neural Network to run in iOS on the GPU, so using matrix notation I need (in order to backpropagate the errors) be able to perform an outer product of 2 vectors.
// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
const device float *inVectorB [[ buffer(1) ]],
device float *outVector [[ buffer(2) ]],
uint id [[ thread_position_in_grid ]]) {
outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}
Solution 1:[1]
You are using thread_position_in_grid incorrectly. If you are dispatching a 2D grid, it should be uint2 or ushort2, otherwise it only gets the x coordinate. Refer to table 5.7 in Metal Shading Language specification.
I'm not sure which outer product are we talking about, but I think the output should be a matrix. If you are storing it linearly, then your code to calculate the outVector should look something like this:
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
const device float *inVectorB [[ buffer(1) ]],
uint2 gridSize [[ threads_per_grid ]],
device float *outVector [[ buffer(2) ]],
uint2 id [[ thread_position_in_grid ]]) {
outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}
Also, if you are dispatching a grid exactly the size of inVectorAxinVectorB, you can use attribute threads_per_grid on a kernel argument to find out how big the grid is.
Alternatively, you can just pass the sizes of the vectors alongside the vectors themselves.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JustSomeGuy |
