'Mean calculation between dataframes in list in R

I have a list of dataframes that all have the same format (same number of rows, same number of columns and columns have the same name).

I would like to create a new dataframe which has the same amount of column as any of the dataframes of the list like shown under.

List:

  1. [[1]]
X Y Z
0.1 0.3 3
0.1 0.4 4
0.2 0.4 5
  1. [[2]]
X Y Z
0.1 0.3 4
0.1 0.4 5
0.2 0.4 6
  1. [[3]]
X Y Z
0.1 0.3 5
0.1 0.4 6
0.2 0.4 7

The result I would like is a dataframe like

Desired Output with mean calculation (only column Z):

X Y Z
0.1 0.3 4
0.1 0.4 5
0.2 0.4 6

Desired Output with sd calculation (only column Z):

X Y Z
0.1 0.3 1
0.1 0.4 1
0.2 0.4 1

Columns X and Y are the same in the output as in all dataframes and column Z is a mean or standard deviation.

4, 5 and 6 beeing means from the three dataframes. 1 being the standard deviation of (3,4,5) or (4,5,6) or (5,6,7) Unlike in this example I can have a lot of dataframes in my list (like 100).

If anyone has a clue



Solution 1:[1]

Place the datasets in a list and get the elemenwise sum (+) and divide by the length of the list

Reduce(`+`, lst1)/length(lst1)

-output

    X   Y Z
1 0.1 0.3 4
2 0.1 0.4 5
3 0.2 0.4 6

Or another option is to convert to array and then use rowMeans by looping over the MARGIN

apply(array(unlist(lst1), c(dim(lst1[[1]]), length(lst1))),
    2, rowMeans)
     [,1] [,2] [,3]
[1,]  0.1  0.3    4
[2,]  0.1  0.4    5
[3,]  0.2  0.4    6

Or another option is to bind the datasets with bind_rows, create a sequence (rowid from data.table), grouped by the sequence and get the mean of the columns

library(dplyr)
library(data.table)
bind_rows(lst1, .id = 'id') %>%
   group_by(id = rowid(id)) %>%
   summarise(across(everything(), mean, na.rm = TRUE), .groups = 'drop') %>%
   select(-id)
# A tibble: 3 × 3
      X     Y     Z
  <dbl> <dbl> <dbl>
1   0.1   0.3     4
2   0.1   0.4     5
3   0.2   0.4     6

data

lst1 <- list(structure(list(X = c(0.1, 0.1, 0.2), Y = c(0.3, 0.4, 0.4
), Z = 3:5), 
 class = "data.frame", row.names = c(NA, -3L)), structure(list(
    X = c(0.1, 0.1, 0.2), Y = c(0.3, 0.4, 0.4), Z = 4:6), 
class = "data.frame", row.names = c(NA, 
-3L)), structure(list(X = c(0.1, 0.1, 0.2), Y = c(0.3, 0.4, 0.4
), Z = 5:7), class = "data.frame", row.names = c(NA, -3L)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1