'Optimizing nested nan removal from a pandas data frame
I've got an unusual data frame that contains positional (2D coordinate) data in a pandas frame, it looks like:
0 ... 2
0 [110.001220703125, -85.76113891601562] ... [79.7227783203125, -131.07473754882812]
1 [118.484619140625, -73.60284423828125] ... [nan, nan]
2 [125.63433837890625, -56.826995849609375] ... [nan, nan]
3 [130.34637451171875, -38.804656982421875] ... [nan, nan]
4 [129.54150390625, -32.026611328125] ... [nan, nan]
The reason for the nan's is that it comes from a tracking neural net, and those values have an associated certainty below threshold, so I've masked these. I realize I can't use .dropna() directly since technically a set of empty sets is a full set so [nan, nan] != nan. Therefore, I made this function:
IndexToDrop_List = []
for Ind in Frame.index.values:
Row = Frame.iloc[Ind,]
for Vals in Row:
if np.isnan(Vals[0]) == True:
IndexToDrop_List.append(Ind)
break
Frame = Frame.drop(IndexToDrop_List).reset_index(drop=True)
Which takes the index value of the row and removes the entire row if an [nan, nan] coordinate is found. It works, however, I was wondering if there was a way to use .apply() to shorten this, these datasets can get pretty large and any time saved would be ideal.
The expected output:
0 ... 2
0 [9.29730224609375, 184.36279296875] ... [-61.94122314453125, 153.804931640625]
1 [4.42108154296875, 184.70294189453125] ... [-65.76788330078125, 155.11004638671875]
2 [-1.9549560546875, 182.90460205078125] ... [-67.963134765625, 155.1727294921875]
3 [0.0401611328125, 177.62042236328125] ... [-68.549072265625, 146.52874755859375]
4 [36.03021240234375, 162.80792236328125] ... [-23.573974609375, 135.88336181640625]
Solution 1:[1]
If you only have a few columns:
df = pd.DataFrame({'a':[[1,2],[3,4]], 'b':[[np.nan, 1], [2,3]]})
a b
0 [1, 2] [nan, 1]
1 [3, 4] [2, 3]
You could expand the columns to form a regular matrix and operate on it:
has_nan_idx = [np.isnan(np.array(df[c].to_list())).any(axis = 1) for c in df.columns]
df[~np.logical_or.reduce(has_nan_idx)]
a b
1 [3, 4] [2, 3]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Z Li |
