'How to remove columns with greater than 90% of zeros in pandas dataframe
Unable to trace the issue with the code. Basically the problem statement is to remove all the columns with greater than 90% of zeros. Following is the code :
num_vars = data.select_dtypes(include=['float64', 'int64'])
num_vars.shape
(1904, 500)
# Removing variables with >90% 0's
for i in num_vars.columns:
if ((len(num_vars[i].loc[num_vars[i]==0])/len(num_vars))>0.9): #checking if 90% data is zero
num_vars.drop(i,axis=1,inplace=True) #delete the column
num_vars.shape
(1904, 500)
As seen above even after running the loop function to remove the columns with > 90% 0's, num_vars.shape still remains the same. Not sure where is the issue. Please guide.
Solution 1:[1]
This should do the job:
mask = (num_vars == 0).sum()/len(num_vars) < 0.9
new_num_vars = num_vars[num_vars.columns[mask]]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | D.Manasreh |
