'For loop for calculating p-value on each row of dataframe
I have a subset of triplicates that belong to a Group. The dataset contains more than 30k rows with information for each triplicates. I want to perform a Wilcoxon Test on these two Groups. Before these lines of codes, I have used iloc to get the subset of groups
groupA = subset.iloc[:,4:7]
groupB = subset.iloc[:,7:10]
I am trying to make a for loop to go over every row and calculate the wilcoxon test on each group:
#Calculate the pvalue using Wilcoxon test
import scipy.stats as stats
from scipy.stats import ranksums
def stats(x):
ranksums(x.groupA, x.groupB)
result[]
for i in range(len(subset)):
row = subset.iloc[i]
result.append(
stats
)
print(result)
The code runs for over 4min and it returns the same values for all rows, which seems incorrect: #output: [RanksumsResult(statistic=-10.39971253610489, pvalue=2.4868227406906807e-25), RanksumsResult(statistic=-10.39971253610489, pvalue=2.4868227406906807e-25), RanksumsResult(statistic=-10.39971253610489, pvalue=2.4868227406906807e-25)....
Update I am trying to create the groups within the for loop, but it takes longer (9min) and it returns only one answer..
my_list = []
for i, row in subset.iterrows():
groupA = subset.iloc[i,4:7]
groupB = subset.iloc[i,7:10]
my_list = ranksums(groupA, groupB)
my_list
It returns: RanksumsResult(statistic=array([ 7.07673793, -1.9041147 , -12.48672522]), pvalue=array([1.47587257e-12, 5.68952474e-02, 8.82102979e-36])) which means is not applying the test over all rows and apparently applying to the group of columns. I wonder if I should create arrays within the for loop then apply the test, but how can I do this without taking so long
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
