'How can I create 2 new columns, one that looks up a prior match and one that show the next match

I have a dataset that looks similar to the below and would like to create two new columns. One column would return the prior year, and if 0 return NAN. The Second column would return the Next year, and if none, add 4 to the year.

data table:

NAME YEAR COUNT
Foo 2012 0
Foo 2017 1
Foo 2022 2
Bar 2015 0
Bar 2014 2

Goal Results:

Name Year Count Prior Next
Foo 2012 0 NAN 2017
Foo 2017 1 2012 2022
Foo 2022 2 2017 2026
Baa 2015 0 NAN 2024
Baa 2024 1 2015 2028

Using excel I would just use a Vlookup or something and return the values. I have no idea how to do this in Python.

I have tried experimenting with making a duplicate dataframe and then using pd.merge but keep getting an error.



Solution 1:[1]

setup

df = pd.DataFrame(
    {
        "Name":["Foo","Foo","Foo","Bar","Bar"],
        "Year":[2012, 2017, 2022, 2015, 2024],
        "Count":[0,1,2,0,1]
    }
)

solution

def make_data(df_sub):   
    years = pd.Series(df_sub["Year"].sort_values().unique())
    df_sub["Prior"] = df_sub["Year"].map(dict(zip(years, years.shift()))).mask(df_sub["Count"] == 0)
    df_sub["Next"] = df_sub["Year"].map(dict(zip(years, years.shift(-1)))).fillna(df_sub["Year"]+4)
    return df_sub

df.groupby("Name").apply(make_data)

This gives you

  Name  Year  Count   Prior    Next
0  Foo  2012      0     NaN  2017.0
1  Foo  2017      1  2012.0  2022.0
2  Foo  2022      2  2017.0  2026.0
3  Bar  2015      0     NaN  2024.0
4  Bar  2024      1  2015.0  2028.0

The solution splits the dataframe according to Name values. For each of these sub-dataframes it sorts the years, and creates maps for next and previous years. For Prior column it will mask any values where Count column is 0. For 'Next` column it will fill empty values with the year + 4.

Solution 2:[2]

Thanks for the setup df code from @Riley.

We can use numpy.where.

df["Prior"] = np.where(df.groupby("Name")["Count"].shift(1).isnull(),np.nan,  df["Year"].shift(1))
df["Next"] = np.where(df.groupby("Name")["Count"].shift(-1).isnull(), df["Year"] + 4, df["Year"].shift(-1))

> df
    Name    Year    Count   Prior   Next
0   Foo     2012    0       NaN     2017.0
1   Foo     2017    1       2012.0  2022.0
2   Foo     2022    2       2017.0  2026.0
3   Bar     2015    0       NaN     2024.0
4   Bar     2024    1       2015.0  2028.0

Since np.nan is float type. The whole column will be float type too.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Denny Chen