'How can I create 2 new columns, one that looks up a prior match and one that show the next match
I have a dataset that looks similar to the below and would like to create two new columns. One column would return the prior year, and if 0 return NAN. The Second column would return the Next year, and if none, add 4 to the year.
data table:
| NAME | YEAR | COUNT |
|---|---|---|
| Foo | 2012 | 0 |
| Foo | 2017 | 1 |
| Foo | 2022 | 2 |
| Bar | 2015 | 0 |
| Bar | 2014 | 2 |
Goal Results:
| Name | Year | Count | Prior | Next |
|---|---|---|---|---|
| Foo | 2012 | 0 | NAN | 2017 |
| Foo | 2017 | 1 | 2012 | 2022 |
| Foo | 2022 | 2 | 2017 | 2026 |
| Baa | 2015 | 0 | NAN | 2024 |
| Baa | 2024 | 1 | 2015 | 2028 |
Using excel I would just use a Vlookup or something and return the values. I have no idea how to do this in Python.
I have tried experimenting with making a duplicate dataframe and then using pd.merge but keep getting an error.
Solution 1:[1]
setup
df = pd.DataFrame(
{
"Name":["Foo","Foo","Foo","Bar","Bar"],
"Year":[2012, 2017, 2022, 2015, 2024],
"Count":[0,1,2,0,1]
}
)
solution
def make_data(df_sub):
years = pd.Series(df_sub["Year"].sort_values().unique())
df_sub["Prior"] = df_sub["Year"].map(dict(zip(years, years.shift()))).mask(df_sub["Count"] == 0)
df_sub["Next"] = df_sub["Year"].map(dict(zip(years, years.shift(-1)))).fillna(df_sub["Year"]+4)
return df_sub
df.groupby("Name").apply(make_data)
This gives you
Name Year Count Prior Next
0 Foo 2012 0 NaN 2017.0
1 Foo 2017 1 2012.0 2022.0
2 Foo 2022 2 2017.0 2026.0
3 Bar 2015 0 NaN 2024.0
4 Bar 2024 1 2015.0 2028.0
The solution splits the dataframe according to Name values. For each of these sub-dataframes it sorts the years, and creates maps for next and previous years. For Prior column it will mask any values where Count column is 0. For 'Next` column it will fill empty values with the year + 4.
Solution 2:[2]
Thanks for the setup df code from @Riley.
We can use numpy.where.
df["Prior"] = np.where(df.groupby("Name")["Count"].shift(1).isnull(),np.nan, df["Year"].shift(1))
df["Next"] = np.where(df.groupby("Name")["Count"].shift(-1).isnull(), df["Year"] + 4, df["Year"].shift(-1))
> df
Name Year Count Prior Next
0 Foo 2012 0 NaN 2017.0
1 Foo 2017 1 2012.0 2022.0
2 Foo 2022 2 2017.0 2026.0
3 Bar 2015 0 NaN 2024.0
4 Bar 2024 1 2015.0 2028.0
Since np.nan is float type. The whole column will be float type too.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Denny Chen |
