'One large numpy array (3mil rows on 5 columns) - how to pick rows that meet several conditions at the same time? Python 3.8.8

        def func(data):
            A = np.zeros([len(data), 5], np.int16)
            for i in range(len(data)):
                if(data[i, 1] >= -10 and data[i, 1] <= -13 and
                   data[i, 3] >= -20 and data[i, 3] <= -22):
                    A[i] = data[i]
                    
                elif(data[i, 1] >= -16 and data[i, 1] <= -19 and
                   data[i, 3] >= -24 and data[i, 3] <= --30):
                    A[i] = data[i]
                
                .... (for another similar 8 elif conditions)
                
                else:
                    continue

            return A[~np.all(A == 0, axis=1)]
        func(data)

Problem: I have a large NumPy array and I need to extract whole rows (not just index or its value) that meet those conditions. Code does run but it is very slow. It wouldn't be an issue but I have to read another 800 files, and then perform other tasks.

How can I optimise this function? Thank you in advance.



Solution 1:[1]

My solution is very close to AJH one but I believe it is a bit simpler and you don't need to keep in memory a full size A frame. Not sure it changes much but it is a bit less memory intensive.

def func(data):
    condition_1 = ((data[:, 1] <= -10) & (data[:, 1] >= -13) & (data[:, 3] <= -20) & (data[:, 3] >= -22))
    condition_2 = ((data[:, 1] <= -16) & (data[:, 1] >= -19) & (data[:, 3] <= -24) & (data[:, 3] >= -30))
    mask = (condition_1 | condition_2)
    return data[mask]

Then just add all the conditions you need. For information & is and and | is or, while I find the full keywords easier to use, it actually doesn't work with numpy arrays.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ssayan