'How to filter data based on statistic of the binned_statistic_2d without a loop

I have quite large dataset of xyz points. I'm using scipy.stats module binned_statistic_2d to calculate mean x in yz bins. I would like to further process points which belongs bins where mean is below condition. How I could achieve this without using loop(it gets very slow with data size what I got)? I have tried using np.isin function but I have not been able to achieve similar results as with loop. Below is example of the code with the loop: (the real data set is several millions of points)

import numpy as np
import scipy.stats as stats


x=np.random.randint(12000,14000,(2000,1))
z=np.random.randint(-1250,1250,(2000,1))
y=np.random.randint(1000,2500,(2000,1))

z_bins=np.arange(z.min(),z.max()+100,100)
y_bins=np.arange(y.min(),y.max()+100,100)
meanx,z_edges,y_edges,binnumber=stats.binned_statistic_2d(z[:,0],y[:,0],x[:,0],statistic='mean',bins=[z_bins,y_bins],expand_binnumbers=True)
meanx_flatten=meanx.flatten(order='C')
indices=np.argwhere(meanx<12500)
binnumber-=1
indices=indices.transpose()
mask=np.argwhere(np.isin(binnumber, indices).all(axis=0))
mask3=np.empty(0)
for i in range(len(indices[0])):
    idx=np.where((binnumber[0]==indices[0,i])&(binnumber[1]==indices[1,i]))
    idx=idx[0]
    mask3=np.concatenate((mask3,idx),axis=0)
mask3=mask3.astype(int)
x_new=x[mask3]
y_new=y[mask3]
z_new=z[mask3]

Thanks for helping me out, I'm quite beginner on coding.

'''

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to filter data based on statistic of the binned_statistic_2d without a loop

Sources

Related Questions