'Optimizing nested nan removal from a pandas data frame

I've got an unusual data frame that contains positional (2D coordinate) data in a pandas frame, it looks like:

                                          0  ...                                          2
0        [110.001220703125, -85.76113891601562]  ...    [79.7227783203125, -131.07473754882812]
1        [118.484619140625, -73.60284423828125]  ...                                 [nan, nan]
2     [125.63433837890625, -56.826995849609375]  ...                                 [nan, nan]
3     [130.34637451171875, -38.804656982421875]  ...                                 [nan, nan]
4           [129.54150390625, -32.026611328125]  ...                                 [nan, nan]

The reason for the nan's is that it comes from a tracking neural net, and those values have an associated certainty below threshold, so I've masked these. I realize I can't use .dropna() directly since technically a set of empty sets is a full set so [nan, nan] != nan. Therefore, I made this function:

    IndexToDrop_List = []
    for Ind in Frame.index.values:
       Row = Frame.iloc[Ind,]
       for Vals in Row:
          if np.isnan(Vals[0]) == True:
              IndexToDrop_List.append(Ind)
              break
Frame = Frame.drop(IndexToDrop_List).reset_index(drop=True)

Which takes the index value of the row and removes the entire row if an [nan, nan] coordinate is found. It works, however, I was wondering if there was a way to use .apply() to shorten this, these datasets can get pretty large and any time saved would be ideal.

The expected output:

                                             0  ...                                          2
0          [9.29730224609375, 184.36279296875]  ...     [-61.94122314453125, 153.804931640625]
1       [4.42108154296875, 184.70294189453125]  ...   [-65.76788330078125, 155.11004638671875]
2       [-1.9549560546875, 182.90460205078125]  ...      [-67.963134765625, 155.1727294921875]
3        [0.0401611328125, 177.62042236328125]  ...     [-68.549072265625, 146.52874755859375]
4      [36.03021240234375, 162.80792236328125]  ...     [-23.573974609375, 135.88336181640625]


Solution 1:[1]

If you only have a few columns:

df = pd.DataFrame({'a':[[1,2],[3,4]], 'b':[[np.nan, 1], [2,3]]})
    a       b
0   [1, 2]  [nan, 1]
1   [3, 4]  [2, 3]

You could expand the columns to form a regular matrix and operate on it:

has_nan_idx = [np.isnan(np.array(df[c].to_list())).any(axis = 1) for c in df.columns]
df[~np.logical_or.reduce(has_nan_idx)]
    a       b
1   [3, 4]  [2, 3]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Z Li