'generate date_range for string intervals like 09:00-10:00/11:00-14:00/15:00-18:00 with BuiltIn functions for pandas
I've been reading the forum, investigating on internet. But can't figure out how to apply a pandas functions to resume this whole code:
def get_time_and_date(schedule, starting_date, position):
# calculate time and date for each start and ending time if the ending time < starting time, add one day to the ending.
my_time = datetime.strptime(schedule.split('-')[position], '%H:%M')
my_date = datetime.strptime(starting_date, '%Y-%m-%d')
# get the starting hour for the range if we are calculating last interval
if position == 1:
starting_hour = datetime.strptime(schedule.split('-')[0], '%H:%M')
starting_hour = datetime(my_date.year, my_date.month, my_date.day, starting_hour.hour, 0)
# convert unify my_time and my_date normalizing the minutes
if hora.minute >= 30:
my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, my_time.hour, 30)
else:
my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, hora.hour, 0)
# if final time of the day < than starting time, means there is a day jump, so we add a day
if position == 1 and my_hour_and_date < starting_hour: my_hour_and_date += timedelta(days=1)
return my_hour_and_date
def get_time_interval_ranges(schedule, my_date):
# get all match schedules if there are any
schedules = schedule.split('/')
intervals_list = []
# loop through al the schedules and add the range with the split separator "Separa aquí"
for my_schedule in schedules:
current_range = pd.date_range(start=get_time_and_date(my_schedule, my_date, 0), end=get_time_and_date(my_schedule, my_date, 1), freq="30min").strftime('%Y-%m-%d, %H:%M').to_list()
intervals_list += current_range
intervals_list.append('separate_range_here')
return intervals_list
def generate_time_intervals(df, column_to_process, new_column):
#generate range of times column
df[new_column] = df.apply(lambda row: get_time_interval_ranges(row[column_to_process], row['my_date'], True), axis=1)
return df
I believe there is a better way to do this, but I can't find out how. What I'm giving to the first function(generate_time_intervals) is a dataFrame with some columns but only Date (yyyy-mm-dd) and schedule are important.
When the schedule is 09:00-15:00 it's easy, just split by the "-" and give it to the builtint function data_range. The problem comes to handle horrendous times like the one on the title or the likes of 09:17-16:24.
Is there any way to handle this without so much looping and the sorts in my code?
Edit:
With this input:
| Worker | Date | Schedule |
|---|---|---|
| Worker1 | 2022-05-01 | 09:00-10:00/11:00-14:00/15:00-18:00 |
| Worker2 | 2022-05-01 | 09:37-15:38 |
I would like this output:
| Date | Interval | Working Minutes |
|---|---|---|
| 2022-05-01 | 09:00 | 30 |
| 2022-05-01 | 09:30 | 53 |
| 2022-05-01 | 10:00 | 30 |
| 2022-05-01 | 10:30 | 30 |
| 2022-05-01 | 11:00 | 60 |
| 2022-05-01 | 11:30 | 60 |
| 2022-05-01 | 12:00 | 60 |
| 2022-05-01 | 12:30 | 60 |
| 2022-05-01 | 13:00 | 60 |
| 2022-05-01 | 13:30 | 60 |
| 2022-05-01 | 14:00 | 30 |
| 2022-05-01 | 14:30 | 30 |
| 2022-05-01 | 15:00 | 60 |
| 2022-05-01 | 15:30 | 38 |
| 2022-05-01 | 16:00 | 30 |
| 2022-05-01 | 16:30 | 30 |
| 2022-05-01 | 17:00 | 30 |
| 2022-05-01 | 17:30 | 30 |
| 2022-05-01 | 18:00 | 0 |
Solution 1:[1]
Working with datetime:
df= pd.DataFrame({'schedule':['09:17-16:24','19:40-21:14']})
schedules = df.schedule.str.split('-',expand=True)
start = pd.to_datetime(schedules[0]).dt.round('H')
end = pd.to_datetime(schedules[1]).dt.round('H')
df['interval_out'] = start.dt.hour.astype(str) + ':00 - ' + end.dt.hour.astype(str) + ':00'
And result:
>>> df
schedule
0 09:17-16:24
1 19:40-21:14
>>> schedules
0 1
0 09:17 16:24
1 19:40 21:14
>>> start
0 2022-05-18 09:00:00
1 2022-05-18 20:00:00
Name: 0, dtype: datetime64[ns]
>>> end
0 2022-05-18 16:00:00
1 2022-05-18 21:00:00
Name: 1, dtype: datetime64[ns]
>>> df
schedule interval_out
0 09:17-16:24 9:00 - 16:00
1 19:40-21:14 20:00 - 21:00
>>>
Of course the rounding should be floor & ceil if you want to expand it...
EDIT: Trying the original question... It also helps if you read about datetime functions in Pandas (which now I learnt...):facepalm:
- Expand the blocks into individual items start/stop
- Floor / ceil them for the start/stop
- Calculate the intervals using a convenient pandas function...
- Explode the intervals as rows
- Calculate the late start
- Calculate soon stop
- Calculate how many people were actually in the office
- Group data on slots, adding lost minutes and worked minutes * worker
- Do the calculation
df['timeblocks']= df.Schedule.str.split('/')
df2 = df.explode('timeblocks')
timeblocks = df2.timeblocks.str.split('-',expand=True)
df2['start'] = pd.to_datetime(df2.Date + " " + timeblocks[0])
df2['stop'] = pd.to_datetime(df2.Date + " " + timeblocks[1])
df2['start_slot'] = df2.start.dt.floor('30min')
df2['stop_slot'] = df2.stop.dt.ceil('30min')
df2['intervals'] = df2.apply(lambda x: pd.date_range(x.start_slot, x.stop_slot, freq='30min'), axis=1)
df3 = df2.explode('intervals')
df3['late_start'] = (df3.start>df3.intervals)*(df3.start-df3.intervals).dt.seconds/60
df3['soon_stop']= ((df3.stop>df3.intervals) & (df3.stop<(df3.intervals+pd.Timedelta('30min'))))*((df3.intervals+pd.Timedelta('30min'))-df3.stop).dt.seconds/60
df3['someone'] = (df3.start<df3.intervals+pd.Timedelta('30min'))&(df3.stop>df3.intervals)#+pd.Timedelta('30min'))
df4 = df3.groupby('intervals').agg({'late_start':sum, 'soon_stop':sum, 'someone':sum})
df4['worked_time'] = df4.someone*30 - df4.late_start - df4.soon_stop
df4
>>> df4
late_start soon_stop someone worked_time
intervals
2022-05-01 09:00:00 0.0 0.0 1 30.0
2022-05-01 09:30:00 7.0 0.0 2 53.0
2022-05-01 10:00:00 0.0 0.0 1 30.0
2022-05-01 10:30:00 0.0 0.0 1 30.0
2022-05-01 11:00:00 0.0 0.0 2 60.0
2022-05-01 11:30:00 0.0 0.0 2 60.0
2022-05-01 12:00:00 0.0 0.0 2 60.0
2022-05-01 12:30:00 0.0 0.0 2 60.0
2022-05-01 13:00:00 0.0 0.0 2 60.0
2022-05-01 13:30:00 0.0 0.0 2 60.0
2022-05-01 14:00:00 0.0 0.0 1 30.0
2022-05-01 14:30:00 0.0 0.0 1 30.0
2022-05-01 15:00:00 0.0 0.0 2 60.0
2022-05-01 15:30:00 0.0 22.0 2 38.0
2022-05-01 16:00:00 0.0 0.0 1 30.0
2022-05-01 16:30:00 0.0 0.0 1 30.0
2022-05-01 17:00:00 0.0 0.0 1 30.0
2022-05-01 17:30:00 0.0 0.0 1 30.0
2022-05-01 18:00:00 0.0 0.0 0 0.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
