'Python: remove all rows from a dataframe where the date is over 2 years ago
EDIT - Answer as below with the extra code new_df = df_hist[msk]
I'm trying to remove all rows from a DataFrame where the date ('RaceDate') is over 2 years ago. I convert the 'RaceDate' using pd.to_datetime - this gives the format e.g. 2018-09-01. I use datetime.datetime.now().date() and this gives the format datetime.date(2022, 2, 10)
I can't work out how to check the difference is over two years as the code gives me Key error. Any help would be most welcome.
d2 = datetime.datetime.now().date()
df_hist['RaceDate'] = pd.to_datetime(df_hist['RaceDate'])
if df_hist.loc[df_hist['RaceDate']] - d2 > 730:
df_hist.loc[df_hist['RaceDate']] = ""
Sample data:
Jockey Code Course RaceDate Ran FPos TotalBtn
Harry Burns AW Chelmsford City 01-Sep-18 9 4 4.5
Harry Burns AW Newcastle 01-Oct-21 14 6 4.75
Harry Burns AW Kempton 01-Nov-21 11 4 4.25
Harry Burns AW Southwell 01-Jan-22 12 4 3.75
Harry Burns AW Wolverhampton 01-Feb-22 12 5 4.75
Christian Howarth AW Kempton 01-Dec-21 14 5 3.5
Harry Davies AW
Jonathan Fisher AW Southwell 01-Dec-17 14 9 10.5
Jonathan Fisher AW Chelmsford City 01-Dec-17 10 8 4.82
Jonathan Fisher AW Southwell 01-Jan-22 12 5 7.75
Jonathan Fisher AW Southwell 01-Jan-22 14 4 3
Jonathan Fisher AW Wolverhampton 01-Feb-22 7 4 2
Jonathan Fisher AW Wolverhampton 01-Feb-22 11 8 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
