'How to shrink a 2D tensor to another 2D tensor using boolean mask?

Say I have a 2D pytorch tensor and a 2D numpy boolean as follows,

a = torch.tensor([[ 0.,  1.,  2.],
                  [ 3.,  4.,  5.],
                  [ 6.,  7.,  8.],
                  [ 9., 10., 11.],
                  [12., 13., 14.]])

m = numpy.array([[ False,  True, False],
                 [  True, False,  True],
                 [ False,  True,  True],
                 [ False, False, False],
                 [  True, False, False]])

They have the same dimension and the number of True's in each column of m is the same.

I need to get the 2x3 tensor that is

a.transpose(0,1).masked_select(torch.from_numpy(m.transpose())).reshape(a.shape[1],-1).transpose(0,1)

which is

tensor([[ 3.,  1.,  5.],
        [12.,  7.,  8.]])

The actual tensor is very large, and the operation needs to be performed many times. So I want to ask what is an efficient way of doing this (or the most efficient way).



Solution 1:[1]

In my benchmarks a jitted numba solution is the fastest, I could find

My benchmarks for a, m with shape (10000,200)(equal result tensors)

1 @numba.jit 13.2 ms (3.46x)
2 list comprehension 31.3 ms (1.46x)
3 baseline 45.7 ms (1.00x)

Generation of sufficiently large sample data for benchmarking

import torch
import numpy as np

def generate_data(rows=500, columns=100):
    
    a = torch.from_numpy(np.random.uniform(1,10, (rows,columns)).astype(np.float32))

    # argsort trick by @divakar https://stackoverflow.com/a/55317373/14277722
    def shuffle_along_axis(a, axis):
        idx = np.random.rand(*a.shape).argsort(axis=axis)
        return np.take_along_axis(a,idx,axis=axis)

    m = shuffle_along_axis(np.full((columns,rows), np.random.randint(2, size=rows)), 1).astype('bool').T
    return a, np.ascontiguousarray(m)

a, m = generate_data(10000,200)

A jitted numba implementation

import numba as nb

@nb.njit
def gather2d(arr1, arr2):
    res = np.zeros((np.count_nonzero(arr2[:,0]), arr1.shape[1]), np.float32)
    counter = np.zeros(arr1.shape[1], dtype=np.intp)
    for i in range(arr1.shape[0]):
        for j in range(arr1.shape[1]):
            if arr2[i,j]:
                res[counter[j], j] = arr1[i,j]
                counter[j] += 1
    return res

torch.from_numpy(gather2d(a.numpy(),m))

Output

# %timeit 10 loops, best of 5: 13.2 ms per loop
tensor([[2.1846, 7.8890, 8.8218,  ..., 4.8309, 9.2853, 6.4404],
        [5.8842, 3.7332, 6.7436,  ..., 1.2914, 3.2983, 3.5627],
        [9.5128, 2.4283, 2.2152,  ..., 4.9512, 9.7335, 9.6252],
        ...,
        [7.3193, 7.8524, 9.6654,  ..., 3.3665, 8.8926, 4.7660],
        [1.3829, 1.3347, 6.6436,  ..., 7.1956, 4.0446, 6.4633],
        [6.4264, 3.6283, 3.6385,  ..., 8.4152, 5.8498, 5.0281]])

Against a vectorized baseline solution

# %timeit 10 loops, best of 5: 45.7 ms per loop
a.gather(0, torch.from_numpy(np.nonzero(m.T)[1].reshape(-1, m.shape[1], order='F')))

A python list comprehension turns out to be surprisingly fast

def g(arr1,arr2):
    return np.array([i[j] for i,j in zip(arr1.T,arr2.T)]).T

# %timeit 10 loops, best of 5: 31.3 ms per loop
torch.from_numpy(g(a.numpy(), m))

Solution 2:[2]

You can try this way by using only NumPy and PyTorch:

b,c = m.nonzero()
b = torch.tensor(b)
c = torch.tensor(c)
a[b,c].reshape(2,3)

#output

tensor([[ 1.,  3.,  5.],
        [ 7.,  8., 12.]])   # True values are taken on axis=1

I used the same example provided by @Michael Szczesny to measure the time:

import numpy as np
import timeit
import torch

rows, columns = (10000,200)
a = torch.from_numpy(np.random.uniform(1,10, (rows,columns)).astype(np.float32))
m = np.random.choice([False, True], size=(rows, columns))

starttime = timeit.default_timer()
b,c = m.nonzero()
b = torch.tensor(b)
c = torch.tensor(c)
a[b,c]
print(f"The time difference is :{(timeit.default_timer() - starttime)*1000} ms")

#output
The time difference is : 26.4 ms

It is better than the second and third approaches of @Michael Szczesny.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2