'Pandas Correlation Groupby
Assuming I have a dataframe similar to the below, how would I get the correlation between 2 specific columns and then group by the 'ID' column? I believe the Pandas 'corr' method finds the correlation between all columns. If possible I would also like to know how I could find the 'groupby' correlation using the .agg function (i.e. np.correlate).
What I have:
ID Val1 Val2 OtherData OtherData
A 5 4 x x
A 4 5 x x
A 6 6 x x
B 4 1 x x
B 8 2 x x
B 7 9 x x
C 4 8 x x
C 5 5 x x
C 2 1 x x
What I need:
ID Correlation_Val1_Val2
A 0.12
B 0.22
C 0.05
Solution 1:[1]
One more simple solution:
df.groupby('ID')[['Val1','Val2']].corr().unstack().iloc[:,1]
Solution 2:[2]
In the above answer; since ix has been depreciated use iloc instead with some minor other changes:
df.groupby('ID')[['Val1','Val2']].corr().iloc[0::2][['Val2']] # to get pandas DataFrame
or
df.groupby('ID')[['Val1','Val2']].corr().iloc[0::2]['Val2'] # to get pandas Series
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | VovaM |
Solution 2 | Ravaging Care |