'R - sapply Over Columns, then lappy Over Elements
Likely because I've spent an hour on this, I'm curious if it is possible - I am trying to transform each element of each column in a dataframe, where the transformation applied to each element depends upon the mean and standard deviation of the column that the element is in. I wanted to use nested lapply or sapply to do this, but ran into some unforeseen issues. My current "solution" (although it does not work as expected) is:
scale_variables <- function(dframe, columns) {
means <- colMeans(dframe[sapply(dframe, is.numeric)])
sds <- colSds(as.matrix(dframe[sapply(dframe, is.numeric)]))
new_dframe <- lapply(seq_along(means), FUN = function(m) {
sapply(dframe[ , columns], FUN = function(x) {
sapply(x, FUN = helper_func, means[[m]], sds[m])
})
})
return(new_dframe)
}
So, I calculate the column means and SDs beforehand; then, I seq_along the index of each mean in means, then each of the columns with the first sapply, and then each element in the second sapply. I get the mean and SD of this particular column using index m, then pass the current element, mean, and SD to the helper function to work on.
Running this on the numeric variables in the iris dataset yields this monstrosity:
'data.frame': 150 obs. of 16 variables:
$ Sepal.Length : num -0.898 -1.139 -1.381 -1.501 -1.018 ...
$ Sepal.Width : num -2.83 -3.43 -3.19 -3.31 -2.71 ...
$ Petal.Length : num -5.37 -5.37 -5.49 -5.25 -5.37 ...
$ Petal.Width : num -6.82 -6.82 -6.82 -6.82 -6.82 ...
$ Sepal.Length.1: num 4.69 4.23 3.77 3.54 4.46 ...
$ Sepal.Width.1 : num 1.0156 -0.1315 0.3273 0.0979 1.245 ...
$ Petal.Length.1: num -3.8 -3.8 -4.03 -3.57 -3.8 ...
$ Petal.Width.1 : num -6.56 -6.56 -6.56 -6.56 -6.56 ...
$ Sepal.Length.2: num 0.76 0.647 0.534 0.477 0.704 ...
$ Sepal.Width.2 : num -0.1462 -0.4294 -0.3161 -0.3727 -0.0895 ...
$ Petal.Length.2: num -1.34 -1.34 -1.39 -1.28 -1.34 ...
$ Petal.Width.2 : num -2.02 -2.02 -2.02 -2.02 -2.02 ...
$ Sepal.Length.3: num 5.12 4.86 4.59 4.46 4.99 ...
$ Sepal.Width.3 : num 3.02 2.36 2.62 2.49 3.15 ...
$ Petal.Length.3: num 0.263 0.263 0.132 0.394 0.263 ...
$ Petal.Width.3 : num -1.31 -1.31 -1.31 -1.31 -1.31 ...
I assume I am applying each mean in means to each column of the dataframe in turn, when I only want to use it for elements in the column it refers to, so I'm not sure that nesting apply functions in this way will do what I need - but can it be done like this?
Solution 1:[1]
I'm not sure what your helper_func, is, but I've made a toy example below
helper_func <- function(x,m,sd) (x-m)/sd
You can then adjust your scale_variables() function like this:
scale_variables <- function(dframe, columns) {
means <- apply(dframe[columns],2,mean, na.rm=T)
sds <- apply(dframe[columns],2,sd)
sapply(columns, \(col) helper_func(dframe[[col]], m=means[col], sd=sds[col]))
}
And call it like this:
scale_variables(iris,names(iris)[sapply(iris, is.numeric)])
Output: (first 6 of 150 rows)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 -0.89767388 1.01560199 -1.33575163 -1.3110521482
2 -1.13920048 -0.13153881 -1.33575163 -1.3110521482
3 -1.38072709 0.32731751 -1.39239929 -1.3110521482
4 -1.50149039 0.09788935 -1.27910398 -1.3110521482
5 -1.01843718 1.24503015 -1.33575163 -1.3110521482
6 -0.53538397 1.93331463 -1.16580868 -1.0486667950
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | langtang |
