'calculate time intersect between 2 time ranges

I have a pandas dataset of time ranges and I want to calculate for each date the overlap of time in minutes between the range (FROM_TIME - TO_TIME) and (23:00 - 07:00)

   DATE     FROM_TIME   TO_TIME
2015-01-01    2354       0408
2015-01-02    0200       0741
2015-01-03    1800       0811
2015-01-04    0015       0756
2015-01-05    0024       0259

so for example in the first date the overlap is 254m (4h and 14m) and for the second 300m (5h). The expected output would be :

   DATE     FROM_TIME   TO_TIME  intersection
2015-01-01    2354       0408       254.0
2015-01-02    0200       0741       300.0 
2015-01-03    1800       0811       480.0
2015-01-04    0015       0756       405.0
2015-01-05    0024       0259       155.0

I tried the following :

sample = {'Date': ['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'], 
          'FROM_TIME':['2354', '0200', '1800', '0015', '0024'], 
          'TO_TIME':['0408', '0741', '0811', '0756', '0259']}

dftest = pd.DataFrame.from_dict(sample)
    
def get_intersection(x):
    
    a=pd.to_datetime(x['FROM_TIME'],format='%H%M',errors='coerce')
    b=pd.to_datetime(x['TO_TIME'],format='%H%M',errors='coerce')
    
    c=pd.to_datetime("2300",format='%H%M')
    d=pd.to_datetime("0700",format='%H%M')
    
    latest_start = max(a, c)
    earliest_end = min(b, d)
    delta = pd.Timedelta(earliest_end - latest_start).seconds/60

    overlap = max(0, delta)
    return overlap

dftest['intersection']=dftest.apply(get_intersection, axis=1)
dftest

    Date      FROM_TIME  TO_TIME  intersection
  2015-01-01    2354      0408      254.0
  2015-01-02    0200      0741      480.0
  2015-01-03    1800      0811      480.0
  2015-01-04    0015      0756      480.0
  2015-01-05    0024      0259      239.0

But the output is not correct. I understand that this is because of the max and min functions that return wrong times in some cases but how can I calculate the intersection in python?

Edit

I have changed the function to

def get_intersection(x):
    
    departure_time=pd.to_datetime(x['DEPARTURE_TIME'],format='%H%M',errors='coerce')
    arrival_time=pd.to_datetime(x['ARRIVAL_TIME'],format='%H%M',errors='coerce')
    
    upper_time=pd.to_datetime("2300",format='%H%M')
    lower_time=pd.to_datetime("0700",format='%H%M')
    
    if departure_time > arrival_time:
        latest_start = max(departure_time, upper_time)
        earliest_end = min(arrival_time, lower_time)
    else:
        if departure_time > lower_time:
            latest_start = lower_time
            earliest_end = lower_time
        else:
            latest_start = min(departure_time, upper_time)
            earliest_end = min(arrival_time, lower_time)
        
   
    delta = (earliest_end - latest_start).seconds/60
    print(f'departure_time = {departure_time}, arrival_time = {arrival_time}\nlatest_start = {latest_start}, earliest_end ={earliest_end}, delta = {delta}')
    overlap = max(0, delta)
    return overlap

and seems to calculate the result that I want although is very slow on the dataset i want to work as it has millions or rows.



Solution 1:[1]

Assuming you want the difference between from and to and that to us always after from, you can convert to datetime and compute the difference. If it is negative, add one day. Then convert to seconds and divide by 60 to get minutes.

s = (pd.to_datetime(dftest['TO_TIME'], format='%H%M')
     -pd.to_datetime(dftest['FROM_TIME'], format='%H%M'))

dftest['intersection'] = s.mask(s.lt('0d'), pd.to_timedelta('1d')+s).dt.total_seconds()//60

Output:

         Date FROM_TIME TO_TIME  intersection
0  2015-01-01      2354    0408         254.0
1  2015-01-02      0200    0741         341.0
2  2015-01-03      1800    0811         851.0
3  2015-01-04      0015    0756         461.0
4  2015-01-05      0024    0259         155.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway