'One large numpy array (3mil rows on 5 columns) - how to pick rows that meet several conditions at the same time? Python 3.8.8
def func(data):
A = np.zeros([len(data), 5], np.int16)
for i in range(len(data)):
if(data[i, 1] >= -10 and data[i, 1] <= -13 and
data[i, 3] >= -20 and data[i, 3] <= -22):
A[i] = data[i]
elif(data[i, 1] >= -16 and data[i, 1] <= -19 and
data[i, 3] >= -24 and data[i, 3] <= --30):
A[i] = data[i]
.... (for another similar 8 elif conditions)
else:
continue
return A[~np.all(A == 0, axis=1)]
func(data)
Problem: I have a large NumPy array and I need to extract whole rows (not just index or its value) that meet those conditions. Code does run but it is very slow. It wouldn't be an issue but I have to read another 800 files, and then perform other tasks.
How can I optimise this function? Thank you in advance.
Solution 1:[1]
My solution is very close to AJH one but I believe it is a bit simpler and you don't need to keep in memory a full size A frame. Not sure it changes much but it is a bit less memory intensive.
def func(data):
condition_1 = ((data[:, 1] <= -10) & (data[:, 1] >= -13) & (data[:, 3] <= -20) & (data[:, 3] >= -22))
condition_2 = ((data[:, 1] <= -16) & (data[:, 1] >= -19) & (data[:, 3] <= -24) & (data[:, 3] >= -30))
mask = (condition_1 | condition_2)
return data[mask]
Then just add all the conditions you need.
For information & is and and | is or, while I find the full keywords easier to use, it actually doesn't work with numpy arrays.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ssayan |
