'Filter data-frame for rows dates outside a date range

I have a data-frame df where the head looks like:

    identifier        department    organisation    status change date
1           14           Finance        Accounts            19/09/2018
2           19         Marketing     Advertising            19/09/2016
22         288        Production              IT            03/01/2017
27         352        Facilities         Kitchen            31/01/2017
54         790         Relations           Sales            31/03/2017

df has several thousand records in it. I also have 2 date variables - the start date and end date of a reference period as strings (arguments from the command line) called:

referencePeriodStartDate and referencePeriodEndDate

which currently equal:

referencePeriodStartDate = 01/01/2017
referencePeriodEndDate = 30/03/2017

I am trying to return and records from df which have a status change date that falls outside the reference period as setup by the referencePeriodStartDate and referencePeriodEndDate

In the example above records with identifier 14 and 19 would be returned as the status change dates they have 19/09/2018 and 19/09/2016 are after and before the reference window respectively.

Example output

    identifier        department    organisation    status change date
1           14           Finance        Accounts            19/09/2018
2           19         Marketing     Advertising            19/09/2016

I have tried the following

resultdf = (df['status change date'].dt.date > referencePeriodEndDate.dt.date) & (df['status change date'].dt.date < referencePeriodStartDate.dt.date)

Where I convert the string dates to type date and try and apply the the logic if the status change date is smaller than referencePeriodStartDate and status change date > referencePeriodEndDate then return the row.

My problem is that nothing is returned. Have I converted to type date incorrectly?



Solution 1:[1]

If want compare dates from column created by date with scalar date need date():

df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate =  pd.to_datetime('01/01/2017')
referencePeriodEndDate = pd.to_datetime('30/03/2017')

resultdf = df[(df['status change date'].dt.date > referencePeriodEndDate.date()) | 
              (df['status change date'].dt.date < referencePeriodStartDate.date())]
print (resultdf)
    identifier department organisation status change date
1           14    Finance     Accounts         2018-09-19
2           19  Marketing  Advertising         2016-09-19
54         790  Relations        Sales         2017-03-31

Or for compare datetimes only remove dates or use between witn inverted condition by ~:

df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate =  '01/01/2017'
referencePeriodEndDate =   '30/03/2017'

resultdf = df[(df['status change date'] > referencePeriodEndDate) |
              (df['status change date'] < referencePeriodStartDate)]
print (resultdf)
    identifier department organisation status change date
1           14    Finance     Accounts         2018-09-19
2           19  Marketing  Advertising         2016-09-19
54         790  Relations        Sales         2017-03-31

mask = ~df['status change date'].between(referencePeriodStartDate, referencePeriodEndDate)
resultdf = df[mask]
print (resultdf)
    identifier department organisation status change date
1           14    Finance     Accounts         2018-09-19
2           19  Marketing  Advertising         2016-09-19
54         790  Relations        Sales         2017-03-31

Solution 2:[2]

Like the code from Jezrael mentions, you're slicing using '&'. Your dates cannot be after x '&' at the same time before 'y'. Convert the string to datetype and then use 'or' OR '|'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 rko