'One-sided one sample T test by group on data frame?

I am trying to perform a one-sided, one sample T test by group on a pandas data frame in python. I feel like I am so close, but I just can't close the last bit. I was trying to follow something similar to these questions (One Sided One Sample T Test Python and T-test for groups within a Pandas dataframe for a unique id).

Say for example I have a data frame df

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})

and I wanted to generate a new data frame df_pval with just two columns: the 'ID' and p-value from a one-sided, one sample T test. I could do this in R like so:

library(dplyr)

df_pval <- df %>%
     group_by(ID) %>%
     summarise(res = list(t.test(value, mu = 0.220, alternative = 'greater')))

df_pval <- data.frame(ID = df_pval$ID,
     pval = sapply(df_pval$res, function(x) x[['p.value']]))

In fact, right now I use os to run an external R script to perform this action, but I know it must be possible just in python. I have tried creating a 'groupby' object and then running .apply:

df_groupID = df.groupby('ID').agg({'value': list})
df_groupID.apply(lambda x: stats.ttest_1samp(x['value'], 0.220))

but this doesn't work. As of now I'm stuck. Any help on this issue would be greatly appreciated. Thank you in advance and sorry if this has already been answered before (and I just didn't understand the solution).



Solution 1:[1]

For one-sided test, this should work for any version of scipy:

import pandas as pd
from scipy import stats
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})


df.groupby('ID')['value'].apply(lambda x: stats.ttest_1samp(list(x),
                                popmean=0.220).pvalue/2) # divide by 2 for one-sided (alternative='greater' or 'smaller')

Gives:

ID
A    0.000665
B    0.457619
C    0.010342

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1