'Filter data-frame for rows dates outside a date range
I have a data-frame df
where the head looks like:
identifier department organisation status change date
1 14 Finance Accounts 19/09/2018
2 19 Marketing Advertising 19/09/2016
22 288 Production IT 03/01/2017
27 352 Facilities Kitchen 31/01/2017
54 790 Relations Sales 31/03/2017
df
has several thousand records in it. I also have 2 date variables - the start date and end date of a reference period as strings (arguments from the command line) called:
referencePeriodStartDate
and referencePeriodEndDate
which currently equal:
referencePeriodStartDate = 01/01/2017
referencePeriodEndDate = 30/03/2017
I am trying to return and records from df
which have a status change date that falls outside the reference period as setup by the referencePeriodStartDate
and referencePeriodEndDate
In the example above records with identifier 14
and 19
would be returned as the status change dates they have 19/09/2018
and 19/09/2016
are after and before the reference window respectively.
Example output
identifier department organisation status change date
1 14 Finance Accounts 19/09/2018
2 19 Marketing Advertising 19/09/2016
I have tried the following
resultdf = (df['status change date'].dt.date > referencePeriodEndDate.dt.date) & (df['status change date'].dt.date < referencePeriodStartDate.dt.date)
Where I convert the string dates to type date and try and apply the the logic if the status change date is smaller than referencePeriodStartDate
and status change date > referencePeriodEndDate
then return the row.
My problem is that nothing is returned. Have I converted to type date incorrectly?
Solution 1:[1]
If want compare dates from column created by date
with scalar date need date()
:
df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate = pd.to_datetime('01/01/2017')
referencePeriodEndDate = pd.to_datetime('30/03/2017')
resultdf = df[(df['status change date'].dt.date > referencePeriodEndDate.date()) |
(df['status change date'].dt.date < referencePeriodStartDate.date())]
print (resultdf)
identifier department organisation status change date
1 14 Finance Accounts 2018-09-19
2 19 Marketing Advertising 2016-09-19
54 790 Relations Sales 2017-03-31
Or for compare datetimes only remove dates or use between
witn inverted condition by ~
:
df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate = '01/01/2017'
referencePeriodEndDate = '30/03/2017'
resultdf = df[(df['status change date'] > referencePeriodEndDate) |
(df['status change date'] < referencePeriodStartDate)]
print (resultdf)
identifier department organisation status change date
1 14 Finance Accounts 2018-09-19
2 19 Marketing Advertising 2016-09-19
54 790 Relations Sales 2017-03-31
mask = ~df['status change date'].between(referencePeriodStartDate, referencePeriodEndDate)
resultdf = df[mask]
print (resultdf)
identifier department organisation status change date
1 14 Finance Accounts 2018-09-19
2 19 Marketing Advertising 2016-09-19
54 790 Relations Sales 2017-03-31
Solution 2:[2]
Like the code from Jezrael mentions, you're slicing using '&'. Your dates cannot be after x '&' at the same time before 'y'. Convert the string to datetype and then use 'or' OR '|'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | rko |