'finding percentage of a conditional group from the total at a column related to an event range from different column in pandas
I have a Dataset like below :
How can I find the percentage of employees terminated at 2022 that First hire at 2021 from total employee using pandas?
| employee No | Event date | Event Description |
|---|---|---|
| 102 | 2021-10-12 | First Hire |
| 103 | 2021-11-02 | First Hire |
| 102 | 2022-01-01 | Terminated |
| 102 | 2021-12-12 | Shift Change |
| 101 | 2021-12-03 | First Hire |
| 103 | 2021-11-05 | Terminated |
| 101 | 2021-12-04 | Terminated |
| 105 | 2022-02-26 | First Hire |
| 106 | 2022-02-26 | First Hire |
| 102 | 2022-03-29 | Second Hire |
| 107 | 2021-05-04 | First Hire |
| 108 | 2022-04-04 | First Hire |
| 109 | 2022-03-03 | Terminated |
| 109 | 2021-12-29 | First Hire |
| 109 | 2022-04-01 | Second Hire |
| 109 | 2022-01-10 | Shift Change |
at the sample dataset Desired Condition belong to employee 109 and 102 how can find the condition at a dataset with 2M records .
Solution 1:[1]
# Convert Event Date to Timestamp if it's not Timestamp already
df["Event date"] = pd.to_datetime(df["Event date"])
# First hired in 2021
cond1 = df["Event date"].dt.year.eq(2021) & df["Event Description"].eq("First Hire")
# Terminated in 2022 & first hired in 2021
cond2 = (
df["Event date"].dt.year.eq(2022)
& df["Event Description"].eq("Terminated")
& df["employee No"].isin(df.loc[cond1, "employee No"])
)
df[cond2]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |
