'For loop for calculating p-value on each row of dataframe

I have a subset of triplicates that belong to a Group. The dataset contains more than 30k rows with information for each triplicates. I want to perform a Wilcoxon Test on these two Groups. Before these lines of codes, I have used iloc to get the subset of groups

groupA = subset.iloc[:,4:7]
groupB = subset.iloc[:,7:10]

I am trying to make a for loop to go over every row and calculate the wilcoxon test on each group:

#Calculate the pvalue using Wilcoxon test

import scipy.stats as stats
from scipy.stats import ranksums


def stats(x):
ranksums(x.groupA, x.groupB)



result[]

for i in range(len(subset)):
row = subset.iloc[i]
result.append(
   stats
)
print(result)

The code runs for over 4min and it returns the same values for all rows, which seems incorrect: #output: [RanksumsResult(statistic=-10.39971253610489, pvalue=2.4868227406906807e-25), RanksumsResult(statistic=-10.39971253610489, pvalue=2.4868227406906807e-25), RanksumsResult(statistic=-10.39971253610489, pvalue=2.4868227406906807e-25)....

Update I am trying to create the groups within the for loop, but it takes longer (9min) and it returns only one answer..

my_list = []

for i, row in subset.iterrows():
   groupA = subset.iloc[i,4:7]
   groupB = subset.iloc[i,7:10]
   my_list = ranksums(groupA, groupB)

my_list

It returns: RanksumsResult(statistic=array([ 7.07673793, -1.9041147 , -12.48672522]), pvalue=array([1.47587257e-12, 5.68952474e-02, 8.82102979e-36])) which means is not applying the test over all rows and apparently applying to the group of columns. I wonder if I should create arrays within the for loop then apply the test, but how can I do this without taking so long

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'For loop for calculating p-value on each row of dataframe

#Calculate the pvalue using Wilcoxon test

Sources

Related Questions