'Mean calculation between dataframes in list in R
I have a list of dataframes that all have the same format (same number of rows, same number of columns and columns have the same name).
I would like to create a new dataframe which has the same amount of column as any of the dataframes of the list like shown under.
List:
- [[1]]
| X | Y | Z |
|---|---|---|
| 0.1 | 0.3 | 3 |
| 0.1 | 0.4 | 4 |
| 0.2 | 0.4 | 5 |
- [[2]]
| X | Y | Z |
|---|---|---|
| 0.1 | 0.3 | 4 |
| 0.1 | 0.4 | 5 |
| 0.2 | 0.4 | 6 |
- [[3]]
| X | Y | Z |
|---|---|---|
| 0.1 | 0.3 | 5 |
| 0.1 | 0.4 | 6 |
| 0.2 | 0.4 | 7 |
The result I would like is a dataframe like
Desired Output with mean calculation (only column Z):
| X | Y | Z |
|---|---|---|
| 0.1 | 0.3 | 4 |
| 0.1 | 0.4 | 5 |
| 0.2 | 0.4 | 6 |
Desired Output with sd calculation (only column Z):
| X | Y | Z |
|---|---|---|
| 0.1 | 0.3 | 1 |
| 0.1 | 0.4 | 1 |
| 0.2 | 0.4 | 1 |
Columns X and Y are the same in the output as in all dataframes and column Z is a mean or standard deviation.
4, 5 and 6 beeing means from the three dataframes. 1 being the standard deviation of (3,4,5) or (4,5,6) or (5,6,7) Unlike in this example I can have a lot of dataframes in my list (like 100).
If anyone has a clue
Solution 1:[1]
Place the datasets in a list and get the elemenwise sum (+) and divide by the length of the list
Reduce(`+`, lst1)/length(lst1)
-output
X Y Z
1 0.1 0.3 4
2 0.1 0.4 5
3 0.2 0.4 6
Or another option is to convert to array and then use rowMeans by looping over the MARGIN
apply(array(unlist(lst1), c(dim(lst1[[1]]), length(lst1))),
2, rowMeans)
[,1] [,2] [,3]
[1,] 0.1 0.3 4
[2,] 0.1 0.4 5
[3,] 0.2 0.4 6
Or another option is to bind the datasets with bind_rows, create a sequence (rowid from data.table), grouped by the sequence and get the mean of the columns
library(dplyr)
library(data.table)
bind_rows(lst1, .id = 'id') %>%
group_by(id = rowid(id)) %>%
summarise(across(everything(), mean, na.rm = TRUE), .groups = 'drop') %>%
select(-id)
# A tibble: 3 × 3
X Y Z
<dbl> <dbl> <dbl>
1 0.1 0.3 4
2 0.1 0.4 5
3 0.2 0.4 6
data
lst1 <- list(structure(list(X = c(0.1, 0.1, 0.2), Y = c(0.3, 0.4, 0.4
), Z = 3:5),
class = "data.frame", row.names = c(NA, -3L)), structure(list(
X = c(0.1, 0.1, 0.2), Y = c(0.3, 0.4, 0.4), Z = 4:6),
class = "data.frame", row.names = c(NA,
-3L)), structure(list(X = c(0.1, 0.1, 0.2), Y = c(0.3, 0.4, 0.4
), Z = 5:7), class = "data.frame", row.names = c(NA, -3L)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
