'How to calculate timestamp difference in sequence python

my DF looks like this:

0   2021-01-01 01:00:00+ 00:00
1   2021-01-01 01:05:00+ 00:00
2   2021-01-01 01:10:00+ 00:00
3   2021-01-01 01:15:00+ 00:00
4   2021-01-04 06:00:00+ 00:00
5   2021-01-04 06:05:00+ 00:00

This column is a timestamp. I want to calculate a duration of each period (without gaps between rows bigger than 5 minutes), its beginning and ending. For example here, I want to get as a results:

  • 15 minutes from 2021-01-01 01:00:00+ 00:00 till 2021-01-01 01:15:00+ 00:00
  • 5 minutes from 2021-01-04 06:00:00+ 00:00 till 2021-01-04 06:05:00+ 00:00

How can I do this?



Solution 1:[1]

IIUC, you could use a custom group and agg:

# ensure datetime if string
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S+ 00:00')

# compute a custom group for consecutive values within 5min
group = df['timestamp'].diff().gt('5min').cumsum()

# aggregate
out = (df
 .groupby(group)['timestamp']
 .agg(**{'start': 'min', 'end': 'max', 'delta': lambda g: g.max()-g.min()})
)

output:

                        start                 end           delta
timestamp                                                        
0         2021-01-01 01:00:00 2021-01-01 01:15:00 0 days 00:15:00
1         2021-01-04 06:00:00 2021-01-04 06:05:00 0 days 00:05:00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway