'How to exclude weekends and holidays from finding the difference between two dates in python

I need to find the difference between 2 dates where certain end dates are blank. I am need to exclude the weekends, as well as the holidays when calculating the dates. I also need to put into account the blank end_dates.

I have a data frame which looks like:

start_date end_date
01-01-2020 05-01-2020
30-10-2021 NaT
15-08-2019 NaT
29-06-2020 15-07-2020

The code for retrieving the holidays I wrote as the following:

df = read_excel(r'dates.xlsx')
df.head()
us_holidays = holidays.UnitesStates()

The following code works around the null values and it excludes the weekends

def business_days(start, end):
    mask = pd.notnull(start) & pd.notnull(end)
    start = start.values.astype('datetime64[D]')[mask]
    end = end.values.astype('datetime64[D]')[mask]
    holi = us_holidays.values.astype('datetime64[D]')[mask]
    result = np.empty(len(mask), dtype=float)
    result[mask] = np.busday_count(start, end, holidays= holi)
    result[~mask] = np.nan
    return result



df['count'] = business_days(df['start_date'], df['end_date'])

The error I get is:

AttributeError: 'builtin_function_or_method' object has no attribute 'astype'

How can I fix the following error?

Any help will be greatly appreciated, thanks.



Solution 1:[1]

  1. I'm not familiar with the holiday package. But holidays.UnitesStates() seems to return an object and not the needed list of dates. However you can create a list of holiday dates for a certain range of years.

  2. I'm not sure why you get "NaT", ususally you get NaNs. But you can handle both.

One way to do it:

   import holidays
   import pandas as pd
   import numpy as np
   import datetime
   
   #Create Dummy DataFrame:
   df = pd.DataFrame(columns=['start_date','end_date'])
   df['start_date'] = np.array(["2020-01-01","2021-10-30","2019-08-15","2020-06-29"])
   df['end_date'] = np.array(["2020-01-05","NaT","NaT", "2020-07-15"])
   
   #Convert Columns to datetime
   df['start_date'] = pd.to_datetime(df['start_date'])
   df['end_date'] = pd.to_datetime(df['end_date'])
   
   #Convert DateTime to Date
   df['start_date'] = df['start_date'].dt.date
   df['end_date'] = df['end_date'].dt.date
   
   #Not sure why you get NaT when you read the file with pandas. So replace it with today:
   df = df.replace({'NaT': datetime.date.today()})
   #In case you get a NaN:
   df = df.fillna(datetime.date.today())
   
   #Get First and Last Year
   max_year = df.max().max().year
   min_year = df.min().min().year
   
   #holidays.US returns an object, so you have to create a list
   us_holidays = list()
   for date,name in sorted(holidays.US(years=list(range(min_year,max_year+1))).items()):
          us_holidays.append(date)
   
   start_dates = list(df['start_date'].values)
   end_dates = list(df['end_date'].values)
   
   df['count'] = np.busday_count(start_dates, end_dates, holidays = us_holidays)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 SebastianB