'group time stamps based on intervals
I have a dataset that looks like this:
main_id time_stamp
aaa 2019-05-29 08:16:05+05
aaa 2019-05-30 00:11:05+05
aaa 2020-05-30 09:15:07+05
bbb 2019-05-29 09:11:05+05
For each main_id, I want to sort the time_stamps such that they are listed from the lowest to the highest.
I want to create a new column day, which uses the time_stamp to derive a number that describes the business day.
Business days are defined like this (example):
Monday 05:00 - Tuesday 01:00 (1 Business Day i.e Monday)
Tuesday 05:00 - Wednesday 01:00 => (1 Business Day i.e Tuesday)
and so on...
This means that the first and second rows with main_id = aaaare from the same business day as the second row is showing time before 1 am on the next day.
However, in the third row, the timestamp is from another day so we add 2 as the day.
The end result could look something like this:
main_id time_stamp day
aaa 2019-05-29 08:16:05+05 1
aaa 2019-05-30 00:11:05+05 1
aaa 2020-05-30 09:15:07+05 2
bbb 2019-05-29 09:11:05+05 1
Day 1 would be anywhere between the first 5:00 am - next day's 1 am. While day 2 would be the next possible business day (next 5 am - 1 am)
How can I achieve this?
Solution 1:[1]
You can define your business start day in business_days and then trivially compare them
from datetime import time
df['time_stamp'] = pd.to_datetime(df['time_stamp'])
business_days = {
0: 1, # Mon
2: 2, # Tue
3: 3, # Wed
4: 4, # Thu
5: 5, # Fri
}
def map_business(x):
# -1 means not a business day
week = int(x.strftime('%w'))
if week in business_days.keys():
if x.time() > time(5, 00):
return business_days[week]
elif x.time() < time(3, 00):
return business_days[week-1]
else:
return -1
elif (week-1) in business_days.keys():
if x.time() < time(1, 00):
return business_days[week-1]
else:
return -1
else:
return -1
df = df.sort_values(['main_id', 'time_stamp'])
df['days'] = df.sort_values(['main_id', 'days'])[df['days'].ne(-1)].groupby('days').ngroup() + 1
df['days'] = df['days'].fillna(-1).astype(int)
main_id time_stamp days
0 aaa 2019-05-29 08:16:05+05:00 1
1 aaa 2019-05-30 00:11:05+05:00 1
2 aaa 2020-05-30 09:15:07+05:00 -1
3 bbb 2019-05-29 09:11:05+05:00 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

