'Counting the number of overlapping periods for a given date_range
I have the following dataframe
df = pd.DataFrame({'user':[0,1,2,3,4], 'start':['2021-01-01 09:52:37',
'2021-01-01 11:45:34','2021-01-01 12:04:50', '2021-01-01 12:07:19','2021-01-01 12:14:59'],
'end':['2021-01-01 10:52:37', '2021-01-01 12:47:34','2021-01-01 12:57:50',
'2021-01-01 13:40:19','2021-01-01 13:53:59']})
| index | start | end |
|---|---|---|
| 0 | 2021-01-01 09:52:37 | 2021-01-01 10:52:37 |
| 1 | 2021-01-01 11:45:34 | 2021-01-01 12:47:34 |
| 2 | 2021-01-01 12:04:50 | 2021-01-01 12:57:50 |
| 3 | 2021-01-01 12:07:19 | 2021-01-01 13:40:19 |
| 4 | 2021-01-01 12:14:59 | 2021-01-01 13:53:59 |
I am trying to count the number of active sessions for a given bin of 5 minutes each.
range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
I have created a function that turn each start-end into a Period:
def create_period(start, end):
return pd.Period(start, freq= end - start)
ts = ts.assign(period=ts.apply(lambda x: create_period(x['SESSION_START_DT'], x['SESSION_END_DT']), axis=1))
but I did not find a built-in function to check if a given timestamp falls inside a period. Is there a faster and better way to go about it?
Update: A straight forward attempt:
range_t = pd.date_range(start = '2021-01-01 08:00', end = '2021-01-01 23:59:59', freq = '5min')
z = dict()
for t in range_t:
z.update({t : ((one_post['SESSION_START_DT'] <= t) & (t <= one_post['SESSION_END_DT'])).sum()})
ts = pd.DataFrame(z.items(), columns=['Timestamp', 'count'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
