'How to calculate timestamp difference in sequence python
my DF looks like this:
0 2021-01-01 01:00:00+ 00:00
1 2021-01-01 01:05:00+ 00:00
2 2021-01-01 01:10:00+ 00:00
3 2021-01-01 01:15:00+ 00:00
4 2021-01-04 06:00:00+ 00:00
5 2021-01-04 06:05:00+ 00:00
This column is a timestamp. I want to calculate a duration of each period (without gaps between rows bigger than 5 minutes), its beginning and ending. For example here, I want to get as a results:
- 15 minutes from 2021-01-01 01:00:00+ 00:00 till 2021-01-01 01:15:00+ 00:00
- 5 minutes from 2021-01-04 06:00:00+ 00:00 till 2021-01-04 06:05:00+ 00:00
How can I do this?
Solution 1:[1]
IIUC, you could use a custom group and agg:
# ensure datetime if string
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S+ 00:00')
# compute a custom group for consecutive values within 5min
group = df['timestamp'].diff().gt('5min').cumsum()
# aggregate
out = (df
.groupby(group)['timestamp']
.agg(**{'start': 'min', 'end': 'max', 'delta': lambda g: g.max()-g.min()})
)
output:
start end delta
timestamp
0 2021-01-01 01:00:00 2021-01-01 01:15:00 0 days 00:15:00
1 2021-01-04 06:00:00 2021-01-04 06:05:00 0 days 00:05:00
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
