'Something strange is happening with groupby and agg functions in python pandas
I have a dataset that looks like
A B C year CompanyName sector
1 nan 1 1999 tesla 10
4 3 4 2000 tesla 10
Nan nan 7 2001 tesla 10
2 nan 8 2002 tesla 10
3 nan 10 1999 BMW 12
2 -1 234 2000 BMW 12
2 nan 548 2002 BMW 12
Column B is the diffrence between two consecutive years of column A for the same company(B=A(n)-A(n-1)).
I calculate a new column D which is: D=(B(n)-C(n))/B(n)
After calculating all these column I group by sector and year to have my data looking like this:
Sector year Amean Bmean Cmean Dmean Dmedian
10 2000 .. .. .. .. . .
10 2001 . . . . . . .
............................................................
The strange thing happening is that i have many missing values for Dmean(Dmean column has too many np.NaNs even though the Dmedian is a numeric value) all other values are present, what am I doing wrong? here is my code:
g = finalData.groupby('CompanyName')
#The year is shifted and we add one to confirm that only consecutive years are
subtracted
finalData['B'] = finalData['A'].diff().where(finalData['year'].eq(g['year'].shift()+1))
finalData["D"] = numpy.where(finalData.B.notnull(), (finalData.B-finalData.C)/finalData.B, numpy.NaN)
finalData = finalData.groupby(['Sector','year']).agg({'C':'mean', "A":'mean', "B":['mean', 'median'], "D":['mean', 'median']}).reset_index()
Ps. I think its the line of code where i use numpy to assign column D
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
