'How to efficiently index numpy 1d arrays by rows of a 2d boolean array

I'd like to be able to index a 1d array by rows of a 2d boolean array. I'm aware methods exist to do this for 1d boolean arrays, but as efficiency is important to me, I don't know of any ways which are not just for loops. An example:

I have a 2d mask (Nxd) and a 1d array (d,) from which I'd like to index by rows from:

mask = [[False, True, False, True], 
        [False, True, True, False]]

y = [0, 1, 2, 3]

From the above, I expect to get:

y_masked = [[1, 3]
            [1, 2]]

I've tried using np.where to index boolean arrays, but I've been unable to convert back the 1d array to the correct 2d one, also, the resulting shapes I find are incorrect. I've also tried simply computing y[mask[i]] for each i, but this is slow. My main issue is in not being able to find a non row-by-row approach.



Solution 1:[1]

In [29]: mask = np.array([[False, True, False, True],
    ...:         [False, True, True, False]])
    ...: 
    ...: y = np.array([0, 1, 2, 3])
In [30]: 
In [30]: mask
Out[30]: 
array([[False,  True, False,  True],
       [False,  True,  True, False]])
In [31]: y
Out[31]: array([0, 1, 2, 3])

First the obvious row by row masking:

In [32]: [y[row] for row in mask]
Out[32]: [array([1, 3]), array([1, 2])]

If we create an array that matches mask in shape we get:

In [33]: Y = y[None,:].repeat(2,axis=0)
In [34]: Y
Out[34]: 
array([[0, 1, 2, 3],
       [0, 1, 2, 3]])
In [35]: Y[mask]
Out[35]: array([1, 3, 1, 2])

We could reshape that to 2d - if the number of Trues per row is consistent.

an alternative to repeat is:

In [39]: np.broadcast_to(y,mask.shape)[mask]
Out[39]: array([1, 3, 1, 2])

I expect this to save on memory, but it isn't faster:

In [40]: timeit np.broadcast_to(y,mask.shape)[mask]
13.2 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [41]: timeit y[None,:].repeat(2,axis=0)[mask]
4.72 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

It's possible that the broadcast_to approach scales better, but we can only tell that by testing.

With where, we could do:

In [42]: np.nonzero(mask)
Out[42]: (array([0, 0, 1, 1]), array([1, 3, 1, 2]))
In [43]: y[np.nonzero(mask)[1]]
Out[43]: array([1, 3, 1, 2])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1