'R: Running multiple tests by selecting (and increasing) number of fixed data points selected
I am new to programming in R and 'stack overflow'. I would appreciate any help. This is my question.
I have a fixed set of data that looks like this (actual data has more than P4)
| Week | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| P1 | 2 | 3 | ... | ... | ... | ... | ... | ... |
| P2 | 0 | 1 | ... | ... | ... | ... | ... | ... |
| P3 | 4 | 2 | ... | ... | ... | ... | ... | ... |
| P4 | 2.5 | 6 | ... | ... | ... | ... | ... | ... |
For each participant, I intend to compare using:
- Week 1 vs Week 5, followed by
- Weeks 1, 2 vs Week 5
- Weeks 1, 2, 3 vs Week 5
- Weeks 1, 2, 3, 4 vs Week 5
- Weeks 1, 2 vs Weeks 5, 6
- Weeks 1, 2, 3 vs Weeks 5, 6
... as well as
- Week 1 vs Weeks 5 and 6
- Week 1 vs Weeks 5, 6 and 7
- Week 1 vs Weeks 5, 6, 7 and 8
Basically all combinations increasing in the number of weeks tested before (Weeks 1 to 4) and after (Weeks 5 to 8) treatment.
How do I run this sequential increase and comparison by weeks (this "fixed progressive selection" should be indep of whichever tests. The latter is not the point of this query)? I know I have to use the group by function to run the steps for each participant. But I am not sure how to proceed with the sequential increase of weeks for each participant.
Would be great to hear any advice. Thanks!
Solution 1:[1]
How to iterate over the desired combinations of groups, appears to be the core of this question.
I will first create a list that holds the names of the desired columns for each permutation, then iterate over that list and show how to access those columns.
I have worked out your question for a simple paired t-test, with two separate vectors of values. Depending on which comparisons you'll eventually perform on the data, you might have to format the data in another way. Perhaps as a dataframe, either in long or wide format. We'll leave this as an exercise to the reader.
example dataset
df <- data.frame( W1 = c(2,0,4,2.5), W2 = c(3,1,2,6),
W3 = c(1,2,3,4), W4 = c(1,2,3,4),
W5 = c(1,2,3,4), W6 = c(1,2,3,4),
W7 = c(1,2,3,4), W8 = c(1,2,3,4))
The example in the question uses only digits for column names, but this is not allowed in R.
define weeks/columns per group
groupA = c('W1', 'W2', 'W3', 'W4')
groupB = c('W5', 'W6', 'W7', 'W8')
generate a list of all desired comparisons
comparisons = list()
for(a_len in seq_along(groupA)) for(b_len in seq_along(groupB)) {
comp = list(A = head(groupA, a_len), B = head(groupB, b_len))
comparisons = append(comparisons, list(comp))
}
This produces a list of 15 comparison pairs, eg:
> comparisons[[10]]$A
[1] "W1" "W2" "W3"
> comparisons[[10]]$B
[1] "W5" "W6"
iterate through list of comparisons
for(groups in comparisons) {
cat(groups$A, 'versus', groups$B, '\n')
# this is where your analysis goes
#
# example: paired t-test on the average of weeks of group A,
# vs average of weeks of group B
# step 1: take a subset of `df` with just the weeks we're interested in
dfA = df[groups$A]
dfB = df[groups$B]
# step 2: take the row-average of all subsetted weeks
avgA = rowMeans(dfA)
avgB = rowMeans(dfB)
# step 3: perform paired t-test and print
test = t.test(avgA, avgB, paired = TRUE)
print(test)
}
result
W1 versus W5
Paired t-test
data: avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.9222 2.1722
sample estimates:
mean of the differences
-0.375
W1 versus W5 W6
Paired t-test
data: avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.9222 2.1722
sample estimates:
mean of the differences
-0.375
W1 versus W5 W6 W7
Paired t-test
data: avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.9222 2.1722
sample estimates:
mean of the differences
-0.375
W1 versus W5 W6 W7 W8
Paired t-test
data: avgA and avgB
t = -0.46852, df = 3, p-value = 0.6714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.9222 2.1722
sample estimates:
mean of the differences
-0.375
W1 W2 versus W5
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.896466 2.021466
sample estimates:
mean of the differences
0.0625
W1 W2 versus W5 W6
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.896466 2.021466
sample estimates:
mean of the differences
0.0625
W1 W2 versus W5 W6 W7
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.896466 2.021466
sample estimates:
mean of the differences
0.0625
W1 W2 versus W5 W6 W7 W8
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.896466 2.021466
sample estimates:
mean of the differences
0.0625
W1 W2 W3 versus W5
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.264311 1.347644
sample estimates:
mean of the differences
0.04166667
W1 W2 W3 versus W5 W6
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.264311 1.347644
sample estimates:
mean of the differences
0.04166667
W1 W2 W3 versus W5 W6 W7
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.264311 1.347644
sample estimates:
mean of the differences
0.04166667
W1 W2 W3 versus W5 W6 W7 W8
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.264311 1.347644
sample estimates:
mean of the differences
0.04166667
W1 W2 W3 W4 versus W5
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.9482332 1.0107332
sample estimates:
mean of the differences
0.03125
W1 W2 W3 W4 versus W5 W6
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.9482332 1.0107332
sample estimates:
mean of the differences
0.03125
W1 W2 W3 W4 versus W5 W6 W7
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.9482332 1.0107332
sample estimates:
mean of the differences
0.03125
W1 W2 W3 W4 versus W5 W6 W7 W8
Paired t-test
data: avgA and avgB
t = 0.10153, df = 3, p-value = 0.9255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.9482332 1.0107332
sample estimates:
mean of the differences
0.03125
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Caspar V. |
