'How to efficiently index numpy 1d arrays by rows of a 2d boolean array
I'd like to be able to index a 1d array by rows of a 2d boolean array. I'm aware methods exist to do this for 1d boolean arrays, but as efficiency is important to me, I don't know of any ways which are not just for loops. An example:
I have a 2d mask (Nxd) and a 1d array (d,) from which I'd like to index by rows from:
mask = [[False, True, False, True],
[False, True, True, False]]
y = [0, 1, 2, 3]
From the above, I expect to get:
y_masked = [[1, 3]
[1, 2]]
I've tried using np.where to index boolean arrays, but I've been unable to convert back the 1d array to the correct 2d one, also, the resulting shapes I find are incorrect. I've also tried simply computing y[mask[i]] for each i, but this is slow. My main issue is in not being able to find a non row-by-row approach.
Solution 1:[1]
In [29]: mask = np.array([[False, True, False, True],
...: [False, True, True, False]])
...:
...: y = np.array([0, 1, 2, 3])
In [30]:
In [30]: mask
Out[30]:
array([[False, True, False, True],
[False, True, True, False]])
In [31]: y
Out[31]: array([0, 1, 2, 3])
First the obvious row by row masking:
In [32]: [y[row] for row in mask]
Out[32]: [array([1, 3]), array([1, 2])]
If we create an array that matches mask in shape we get:
In [33]: Y = y[None,:].repeat(2,axis=0)
In [34]: Y
Out[34]:
array([[0, 1, 2, 3],
[0, 1, 2, 3]])
In [35]: Y[mask]
Out[35]: array([1, 3, 1, 2])
We could reshape that to 2d - if the number of Trues per row is consistent.
an alternative to repeat is:
In [39]: np.broadcast_to(y,mask.shape)[mask]
Out[39]: array([1, 3, 1, 2])
I expect this to save on memory, but it isn't faster:
In [40]: timeit np.broadcast_to(y,mask.shape)[mask]
13.2 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [41]: timeit y[None,:].repeat(2,axis=0)[mask]
4.72 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
It's possible that the broadcast_to approach scales better, but we can only tell that by testing.
With where, we could do:
In [42]: np.nonzero(mask)
Out[42]: (array([0, 0, 1, 1]), array([1, 3, 1, 2]))
In [43]: y[np.nonzero(mask)[1]]
Out[43]: array([1, 3, 1, 2])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
