'pandas trim with NA value
I have some data frame with the following structure.
timestamp Col_A Col_B
2021-04-19T16:12:00Z NA NA
2021-04-19T16:13:00Z NA NA
2021-04-19T16:14:00Z 1 NA
2021-04-19T16:15:00Z 2 NA
2021-04-19T16:16:00Z 3 1
2021-04-19T16:17:00Z 4 2
2021-04-19T16:18:00Z 5 3
2021-04-19T16:19:00Z 6 4
2021-04-19T16:20:00Z NA NA
I would like to make a trim function where basically it would trim all contiguous NA columns from top and bottom i.e. in this case it would filter the frame as:
2021-04-19T16:14:00Z 1 NA
2021-04-19T16:15:00Z 2 NA
2021-04-19T16:16:00Z 3 1
2021-04-19T16:17:00Z 4 2
2021-04-19T16:18:00Z 5 3
2021-04-19T16:19:00Z 6 4
Solution 1:[1]
Check with cumprod after all
s = df.isna().all(1)
out = df[s.cumprod() + s.iloc[::-1].cumprod() == 0]
Solution 2:[2]
This is how you can drop the rows that have NA in the defined columns:
df.dropna(subset=['Col_A', 'Col_B'], how='all')
Solution 3:[3]
You can try first_valid_index and last_valid_index
out = df.loc[df[['Col_A', 'Col_B']].first_valid_index():df[['Col_A', 'Col_B']].last_valid_index(), :]
print(out)
timestamp Col_A Col_B
2 2021-04-19T16:14:00Z 1.0 NaN
3 2021-04-19T16:15:00Z 2.0 NaN
4 2021-04-19T16:16:00Z 3.0 1.0
5 2021-04-19T16:17:00Z 4.0 2.0
6 2021-04-19T16:18:00Z 5.0 3.0
7 2021-04-19T16:19:00Z 6.0 4.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
| Solution 2 | Andrey Lukyanenko |
| Solution 3 |
