'Calculate the pair-wise correlation between distinct class pairs over two feature columns and the target variable?

Most similar questions relating to calculating this involve a single correlation value for each feature column, showing how the features in a dataset correlate to the target variable.

I'd like to do this row-wise for each distinct pairing of the two values between two feature columns.

e.g.,

This is an example of the dataset before the string labels are encoded numerically:

PrimaryProcedure	SecondaryProcedure	LengthOfStay
pre_op	brain_surgery	30
pre_op	spinal_implant	14
pre_op	spinal_implant	10
check_up	NULL	1

I'd like a table that shows how strongly each of the distinct class-pairs within the two feature columns for procedures correlate to a patient's length of stay.

e.g.,

This is an example of the dataset I'd like to produce:

DistinctPairwiseProcedures	Correlation
(pre_op, brain_surgery)	0.7
(pre_op, spinal_implant)	0.4
(check_up)	0.9

In summary, a dataframe containing the distinct pairs of procedures and how strongly correlated they are to the target variable, LengthOfStay. I could then sort this dataframe to see which combinations of procedures could accurately be fed into a regression model.

The code below allows me to get a list of the distinct pairwise procedures, however, I'm not sure how to use this list as an index for calculating the correlation to LengthOfStay for each.

from itertools import product
    
print(list(product(dataframe['PrimaryProcedure'].unique(), dataframe['SecondaryProcedure'].unique())))

DistinctPairwiseProcedures
(pre_op, brain_surgery)
(pre_op, spinal_implant)
(check_up)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Calculate the pair-wise correlation between distinct class pairs over two feature columns and the target variable?

Sources

Related Questions