'Pandas t-test using row as the arrays
I need to find a way to calculate a p-value for two sets of data, comparing each row in one DataFrame with the accompanying row in another DataFrame. For example, array1 would be the five items in row 300 (not including stdev and Ctrl average), and same for array2 with the five items in row 300.
df1:
Pep Ctrl 1 Pep Ctrl 2 Pep Ctrl 3 Pep Ctrl 4 Pep Ctrl 5 stdev Ctrl average
300 47591000.0 NaN 49576000.0 41288000.0 61727000.0 8.551730e+06 4.174675e+07
301 4305900.0 2670800.0 NaN NaN 7338400.0 2.368407e+06 4.170877e+06
302 11466000.0 3799400.0 NaN 18552000.0 31661000.0 1.184124e+07 1.546393e+07
303 11255000.0 5402300.0 18337000.0 19706000.0 40286000.0 1.321849e+07 1.803413e+07
df2:
MCI 1 vs Ctrl normalized MCI 2 vs Ctrl normalized MCI 3 vs Ctrl normalized MCI 4 vs Ctrl normalized MCI 5 vs Ctrl normalized stdev
300 1.054045e+08 4.980206e+07 4.764870e+07 1.834201e+07 2.994124e+07 3.346473e+07
301 1.019931e+07 3.309509e+06 6.595145e+06 1.089385e+07 NaN 3.508776e+06
302 3.288333e+07 6.953062e+06 1.430190e+07 4.988915e+06 2.310888e+07 1.162495e+07
303 3.332308e+07 1.682790e+07 2.951138e+07 9.474570e+06 2.965893e+07 1.014219e+07
I need to do a two-tailed t test with equal variances, and then add this as the last column. Alternatively, if SciPy has an option to just input the number of items, standard deviation, and average, this could also work.
This is what I tried:
group1 = [df1['Pep Ctrl 1'],df1['Pep Ctrl 2'],df1['Pep Ctrl 3'],df1['Pep Ctrl 4'],df1['Pep Ctrl 5']]
group2 = [df2['MCI 1 vs Ctrl normalized'], df2['MCI 2 vs Ctrl normalized'], df2['MCI 3 vs Ctrl normalized'], df2['MCI 4 vs Ctrl normalized'], df2['MCI 5 vs Ctrl normalized']]
ttest = stats.ttest_ind(a=group1,b=group2,axis = 1, equal_var = True)
Any help would be appreciated.
df1 constructor:
{'Pep Ctrl 1': [47591000.0, 4305900.0, 11466000.0, 11255000.0],
'Pep Ctrl 2': [nan, 2670800.0, 3799400.0, 5402300.0],
'Pep Ctrl 3': [49576000.0, nan, nan, 18337000.0],
'Pep Ctrl 4': [41288000.0, nan, 18552000.0, 19706000.0],
'Pep Ctrl 5': [61727000.0, 7338400.0, 31661000.0, 40286000.0],
'stdev': [8551730.0, 2368407.0, 11841240.0, 13218490.0],
'Ctrl average': [41746750.0, 4170877.0, 15463930.0, 18034130.0]}
df2 constructor:
{'MCI 1 vs Ctrl normalized': [105404500.0, 10199310.0, 32883330.0, 33323080.0],
'MCI 2 vs Ctrl normalized': [49802060.0, 3309509.0, 6953062.0, 16827900.0],
'MCI 3 vs Ctrl normalized': [47648700.0, 6595145.0, 14301900.0, 29511380.0],
'MCI 4 vs Ctrl normalized': [18342010.0, 10893850.0, 4988915.0, 9474570.0],
'MCI 5 vs Ctrl normalized': [29941240.0, nan, 23108880.0, 29658930.0],
'stdev': [33464730.0, 3508776.0, 11624950.0, 10142190.0]}
Solution 1:[1]
You could use iterrows to iterate over df1 and compare each row with a corresponding row in df2 with the same index:
from scipy import stats
df2_cols = df2.columns.drop('stdev')
out = [stats.ttest_ind(df2.loc[i, df2_cols], row, equal_var=True, nan_policy='omit')
for i, row in df1.drop(columns=['stdev','Ctrl average']).iterrows()]
Output:
[Ttest_indResult(statistic=0.010483243999151896, pvalue=0.9919282503324176),
Ttest_indResult(statistic=1.2563264347346306, pvalue=0.26449954642964396),
Ttest_indResult(statistic=0.009874028613226149, pvalue=0.9923973079846519),
Ttest_indResult(statistic=0.6390907092148139, pvalue=0.5406265164807074)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
