'YOY growth based on ID
I am trying to calculate year or year growth for a variable in a Pandas dataframe. My data looks like this:
| Year | Country | Industry | Value |
|---|---|---|---|
| 2000 | USA | Manufacturing | 5 |
| 2000 | Mexico | Manufacturing | 10 |
| 2001 | Mexico | Manufacturing | 15 |
| 2002 | Mexico | Other | 20 |
I have different number of observations depending on the Country or Industry. Expected output:
| Year | Country | Industry | Value | YOY |
|---|---|---|---|---|
| 2000 | USA | Manufacturing | 5 | NaN |
| 2000 | Mexico | Manufacturing | 10 | NaN |
| 2001 | Mexico | Manufacturing | 15 | 50% |
| 2002 | Mexico | Other | 20 | NaN |
I tried different things including:
df.groupby(['Country','Industry','Year'])['Value'].pct_change()
df['YOY'] = (df['Value'] - df.sort_values(by=['Country','Industry','Year']).groupby(['Country','Industry'])['Value'].shift(1))) / df['Value']
The first line calculates growth between rows without resetting for a new Country or Industry. The second one has incoherent results.
Any lead I could take? Thanks!!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
