'How to exclude weekends and holidays from finding the difference between two dates in python
I need to find the difference between 2 dates where certain end dates are blank. I am need to exclude the weekends, as well as the holidays when calculating the dates. I also need to put into account the blank end_dates.
I have a data frame which looks like:
start_date | end_date |
---|---|
01-01-2020 | 05-01-2020 |
30-10-2021 | NaT |
15-08-2019 | NaT |
29-06-2020 | 15-07-2020 |
The code for retrieving the holidays I wrote as the following:
df = read_excel(r'dates.xlsx')
df.head()
us_holidays = holidays.UnitesStates()
The following code works around the null values and it excludes the weekends
def business_days(start, end):
mask = pd.notnull(start) & pd.notnull(end)
start = start.values.astype('datetime64[D]')[mask]
end = end.values.astype('datetime64[D]')[mask]
holi = us_holidays.values.astype('datetime64[D]')[mask]
result = np.empty(len(mask), dtype=float)
result[mask] = np.busday_count(start, end, holidays= holi)
result[~mask] = np.nan
return result
df['count'] = business_days(df['start_date'], df['end_date'])
The error I get is:
AttributeError: 'builtin_function_or_method' object has no attribute 'astype'
How can I fix the following error?
Any help will be greatly appreciated, thanks.
Solution 1:[1]
I'm not familiar with the holiday package. But holidays.UnitesStates() seems to return an object and not the needed list of dates. However you can create a list of holiday dates for a certain range of years.
I'm not sure why you get "NaT", ususally you get NaNs. But you can handle both.
One way to do it:
import holidays
import pandas as pd
import numpy as np
import datetime
#Create Dummy DataFrame:
df = pd.DataFrame(columns=['start_date','end_date'])
df['start_date'] = np.array(["2020-01-01","2021-10-30","2019-08-15","2020-06-29"])
df['end_date'] = np.array(["2020-01-05","NaT","NaT", "2020-07-15"])
#Convert Columns to datetime
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
#Convert DateTime to Date
df['start_date'] = df['start_date'].dt.date
df['end_date'] = df['end_date'].dt.date
#Not sure why you get NaT when you read the file with pandas. So replace it with today:
df = df.replace({'NaT': datetime.date.today()})
#In case you get a NaN:
df = df.fillna(datetime.date.today())
#Get First and Last Year
max_year = df.max().max().year
min_year = df.min().min().year
#holidays.US returns an object, so you have to create a list
us_holidays = list()
for date,name in sorted(holidays.US(years=list(range(min_year,max_year+1))).items()):
us_holidays.append(date)
start_dates = list(df['start_date'].values)
end_dates = list(df['end_date'].values)
df['count'] = np.busday_count(start_dates, end_dates, holidays = us_holidays)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | SebastianB |