'R: Running multiple tests by selecting (and increasing) number of fixed data points selected

I am new to programming in R and 'stack overflow'. I would appreciate any help. This is my question.

I have a fixed set of data that looks like this (actual data has more than P4)

Week 1 2 3 4 5 6 7 8
P1 2 3 ... ... ... ... ... ...
P2 0 1 ... ... ... ... ... ...
P3 4 2 ... ... ... ... ... ...
P4 2.5 6 ... ... ... ... ... ...

For each participant, I intend to compare using:

  • Week 1 vs Week 5, followed by
  • Weeks 1, 2 vs Week 5
  • Weeks 1, 2, 3 vs Week 5
  • Weeks 1, 2, 3, 4 vs Week 5
  • Weeks 1, 2 vs Weeks 5, 6
  • Weeks 1, 2, 3 vs Weeks 5, 6

... as well as

  • Week 1 vs Weeks 5 and 6
  • Week 1 vs Weeks 5, 6 and 7
  • Week 1 vs Weeks 5, 6, 7 and 8

Basically all combinations increasing in the number of weeks tested before (Weeks 1 to 4) and after (Weeks 5 to 8) treatment.

How do I run this sequential increase and comparison by weeks (this "fixed progressive selection" should be indep of whichever tests. The latter is not the point of this query)? I know I have to use the group by function to run the steps for each participant. But I am not sure how to proceed with the sequential increase of weeks for each participant.

Would be great to hear any advice. Thanks!

r


Solution 1:[1]

How to iterate over the desired combinations of groups, appears to be the core of this question.

I will first create a list that holds the names of the desired columns for each permutation, then iterate over that list and show how to access those columns.

I have worked out your question for a simple paired t-test, with two separate vectors of values. Depending on which comparisons you'll eventually perform on the data, you might have to format the data in another way. Perhaps as a dataframe, either in long or wide format. We'll leave this as an exercise to the reader.

example dataset
df <- data.frame( W1 = c(2,0,4,2.5), W2 = c(3,1,2,6),
                  W3 = c(1,2,3,4),   W4 = c(1,2,3,4),
                  W5 = c(1,2,3,4),   W6 = c(1,2,3,4),
                  W7 = c(1,2,3,4),   W8 = c(1,2,3,4))

The example in the question uses only digits for column names, but this is not allowed in R.

define weeks/columns per group
groupA = c('W1', 'W2', 'W3', 'W4')
groupB = c('W5', 'W6', 'W7', 'W8')
generate a list of all desired comparisons
comparisons = list()

for(a_len in seq_along(groupA)) for(b_len in seq_along(groupB)) {

  comp = list(A = head(groupA, a_len), B = head(groupB, b_len))
  comparisons = append(comparisons, list(comp))

}

This produces a list of 15 comparison pairs, eg:

> comparisons[[10]]$A
[1] "W1" "W2" "W3"

> comparisons[[10]]$B
[1] "W5" "W6"
iterate through list of comparisons
for(groups in comparisons) {
  
  cat(groups$A, 'versus', groups$B, '\n')
  
  # this is where your analysis goes
  #
  # example: paired t-test on the average of weeks of group A,
  # vs average of weeks of group B
  
  # step 1: take a subset of `df` with just the weeks we're interested in
  dfA = df[groups$A]
  dfB = df[groups$B]
  
  # step 2: take the row-average of all subsetted weeks
  avgA = rowMeans(dfA)
  avgB = rowMeans(dfB)
  
  # step 3: perform paired t-test and print
  test = t.test(avgA, avgB, paired = TRUE)
  print(test)
  
}
result
W1 versus W5 

    Paired t-test

data:  avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9222  2.1722
sample estimates:
mean of the differences 
                 -0.375 

W1 versus W5 W6 

    Paired t-test

data:  avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9222  2.1722
sample estimates:
mean of the differences 
                 -0.375 

W1 versus W5 W6 W7 

    Paired t-test

data:  avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9222  2.1722
sample estimates:
mean of the differences 
                 -0.375 

W1 versus W5 W6 W7 W8 

    Paired t-test

data:  avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9222  2.1722
sample estimates:
mean of the differences 
                 -0.375 

W1 W2 versus W5 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.896466  2.021466
sample estimates:
mean of the differences 
                 0.0625 

W1 W2 versus W5 W6 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.896466  2.021466
sample estimates:
mean of the differences 
                 0.0625 

W1 W2 versus W5 W6 W7 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.896466  2.021466
sample estimates:
mean of the differences 
                 0.0625 

W1 W2 versus W5 W6 W7 W8 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.896466  2.021466
sample estimates:
mean of the differences 
                 0.0625 

W1 W2 W3 versus W5 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.264311  1.347644
sample estimates:
mean of the differences 
             0.04166667 

W1 W2 W3 versus W5 W6 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.264311  1.347644
sample estimates:
mean of the differences 
             0.04166667 

W1 W2 W3 versus W5 W6 W7 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.264311  1.347644
sample estimates:
mean of the differences 
             0.04166667 

W1 W2 W3 versus W5 W6 W7 W8 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.264311  1.347644
sample estimates:
mean of the differences 
             0.04166667 

W1 W2 W3 W4 versus W5 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9482332  1.0107332
sample estimates:
mean of the differences 
                0.03125 

W1 W2 W3 W4 versus W5 W6 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9482332  1.0107332
sample estimates:
mean of the differences 
                0.03125 

W1 W2 W3 W4 versus W5 W6 W7 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9482332  1.0107332
sample estimates:
mean of the differences 
                0.03125 

W1 W2 W3 W4 versus W5 W6 W7 W8 

    Paired t-test

data:  avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9482332  1.0107332
sample estimates:
mean of the differences 
                0.03125

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Caspar V.