'Analytics question to define status that depends on date
Date status output
7/8/2021 12:00:00 AM latest
8/11/2021 12:00:00 AM latest
2/11/2022 12:00:00 AM latest
2/1/2022 12:00:00 AM old
Date status output
7/8/2021 12:00:00 AM latest not use
8/11/2021 12:00:00 AM latest not use
2/11/2022 12:00:00 AM latest use
2/1/2022 12:00:00 AM old not use
Compare the date in the Date column with the today's date, if it is less than equal to 6 months old and its status is latest, then the output should be use, otherwise not use.
Solution 1:[1]
Let's first build our data :
import pandas as pd
from io import StringIO
csvfile = StringIO(
"""Date\tstatus\toutput
7/8/2021 12:00:00 AM\tlatest
8/11/2021 12:00:00 AM\tlatest
2/11/2022 12:00:00 AM\tlatest
2/1/2022 12:00:00 AM\told""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
Ensure Date column contains date formatting
df.Date = pd.to_datetime(df.Date, format = "%m/%d/%Y %H:%M:%S %p")
df
Out[344]:
Date status output
0 2021-07-08 12:00:00 latest NaN
1 2021-08-11 12:00:00 latest NaN
2 2022-02-11 12:00:00 latest NaN
3 2022-02-01 12:00:00 old NaN
Then apply the logic you said, True if latest and if Date + 6 months is greater than today (which is the same as said it's less than 6 months old) :
mask_date = (df.Date + pd.DateOffset(months=6) > pd.to_datetime("today")) & (df.status == 'latest')
mask_date
Out[347]:
0 False
1 False
2 True
3 False
dtype: bool
Finally just map your mask :
df.loc[:, "output"] = mask_date.map(lambda x : "use" if x else "not used")
df
Out[350]:
Date status output
0 2021-07-08 12:00:00 latest not used
1 2021-08-11 12:00:00 latest not used
2 2022-02-11 12:00:00 latest use
3 2022-02-01 12:00:00 old not used
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Zelemist |
