'Categorise the datetime in to a new column based on the range
I have a dataset called df that looks like this:
| provider | fid | pid | datetime |
|---|---|---|---|
| CHE-223 | 2bfc9a62 | 2f43d557 | 2021-09-26T23:18:00 |
| CHE-223 | fff669e9 | 295b82e2 | 2021-08-13T09:10:00 |
I wanted to create a new table called wave that has categorical values for a range of date from datetime. e.g. For the date from 16th of November 2019 until 28th of February 2020, it gives a value before covid and so on.
I used a loop function to achieve this and this is the code I used:
def wave(row):
if (row["datetime"] <= pd.Timestamp("2019-11-16")) & (row["datetime"] >= pd.Timestamp("2020-02-28")):
wave="before covid"
elif (row["datetime"] <= pd.Timestamp("2020-03-01")) & (row["datetime"] >= pd.Timestamp("2020-06-15")):
wave="1st wave"
elif (row["datetime"] <= pd.Timestamp("2020-06-16")) & (row["datetime"] >= pd.Timestamp("2020-09-30")):
wave="between waves"
elif (row["datetime"] <= pd.Timestamp("2020-10-01")) & (row["datetime"] >= pd.Timestamp("2021-01-15")):
wave="2nd wave"
df["wave"]=df.apply(lambda row:wave(row),axis=1)
But it gives me a column named wave but with no values. How do I fix this and categorise the date?
Solution 1:[1]
Your function needs to return something. Also your dates comparisons are inverted:
(row["datetime"] <= pd.Timestamp("2019-11-16")) & (row["datetime"] >= pd.Timestamp("2020-02-28"))
would match dates that are before the 16th of November 2019 and at the same time after the 28th of February 2020... which of course never happens.
Your function should look like:
def wave(row):
wave = ""
if (row["datetime"] >= pd.Timestamp("2019-11-16")) and (row["datetime"] <= pd.Timestamp("2020-02-28")):
wave="before covid"
elif (row["datetime"] >= pd.Timestamp("2020-03-01")) and (row["datetime"] <= pd.Timestamp("2020-06-15")):
wave="1st wave"
elif (row["datetime"] >= pd.Timestamp("2020-06-16")) and (row["datetime"] <= pd.Timestamp("2020-09-30")):
wave="between waves"
elif (row["datetime"] >= pd.Timestamp("2020-10-01")) and (row["datetime"] <= pd.Timestamp("2021-01-15")):
wave="2nd wave"
elif (row["datetime"] >= pd.Timestamp("2021-01-16")):
wave="after second wave"
return wave
Edit: also & is a bit-wise operator. For logical expressions use and.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
