'calculate time intersect between 2 time ranges
I have a pandas dataset of time ranges and I want to calculate for each date the overlap of time in minutes between the range (FROM_TIME - TO_TIME) and (23:00 - 07:00)
DATE FROM_TIME TO_TIME
2015-01-01 2354 0408
2015-01-02 0200 0741
2015-01-03 1800 0811
2015-01-04 0015 0756
2015-01-05 0024 0259
so for example in the first date the overlap is 254m (4h and 14m) and for the second 300m (5h). The expected output would be :
DATE FROM_TIME TO_TIME intersection
2015-01-01 2354 0408 254.0
2015-01-02 0200 0741 300.0
2015-01-03 1800 0811 480.0
2015-01-04 0015 0756 405.0
2015-01-05 0024 0259 155.0
I tried the following :
sample = {'Date': ['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'],
'FROM_TIME':['2354', '0200', '1800', '0015', '0024'],
'TO_TIME':['0408', '0741', '0811', '0756', '0259']}
dftest = pd.DataFrame.from_dict(sample)
def get_intersection(x):
a=pd.to_datetime(x['FROM_TIME'],format='%H%M',errors='coerce')
b=pd.to_datetime(x['TO_TIME'],format='%H%M',errors='coerce')
c=pd.to_datetime("2300",format='%H%M')
d=pd.to_datetime("0700",format='%H%M')
latest_start = max(a, c)
earliest_end = min(b, d)
delta = pd.Timedelta(earliest_end - latest_start).seconds/60
overlap = max(0, delta)
return overlap
dftest['intersection']=dftest.apply(get_intersection, axis=1)
dftest
Date FROM_TIME TO_TIME intersection
2015-01-01 2354 0408 254.0
2015-01-02 0200 0741 480.0
2015-01-03 1800 0811 480.0
2015-01-04 0015 0756 480.0
2015-01-05 0024 0259 239.0
But the output is not correct. I understand that this is because of the max and min functions that return wrong times in some cases but how can I calculate the intersection in python?
Edit
I have changed the function to
def get_intersection(x):
departure_time=pd.to_datetime(x['DEPARTURE_TIME'],format='%H%M',errors='coerce')
arrival_time=pd.to_datetime(x['ARRIVAL_TIME'],format='%H%M',errors='coerce')
upper_time=pd.to_datetime("2300",format='%H%M')
lower_time=pd.to_datetime("0700",format='%H%M')
if departure_time > arrival_time:
latest_start = max(departure_time, upper_time)
earliest_end = min(arrival_time, lower_time)
else:
if departure_time > lower_time:
latest_start = lower_time
earliest_end = lower_time
else:
latest_start = min(departure_time, upper_time)
earliest_end = min(arrival_time, lower_time)
delta = (earliest_end - latest_start).seconds/60
print(f'departure_time = {departure_time}, arrival_time = {arrival_time}\nlatest_start = {latest_start}, earliest_end ={earliest_end}, delta = {delta}')
overlap = max(0, delta)
return overlap
and seems to calculate the result that I want although is very slow on the dataset i want to work as it has millions or rows.
Solution 1:[1]
Assuming you want the difference between from and to and that to us always after from, you can convert to datetime and compute the difference. If it is negative, add one day. Then convert to seconds and divide by 60 to get minutes.
s = (pd.to_datetime(dftest['TO_TIME'], format='%H%M')
-pd.to_datetime(dftest['FROM_TIME'], format='%H%M'))
dftest['intersection'] = s.mask(s.lt('0d'), pd.to_timedelta('1d')+s).dt.total_seconds()//60
Output:
Date FROM_TIME TO_TIME intersection
0 2015-01-01 2354 0408 254.0
1 2015-01-02 0200 0741 341.0
2 2015-01-03 1800 0811 851.0
3 2015-01-04 0015 0756 461.0
4 2015-01-05 0024 0259 155.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
