'R - Scaling numeric values only in a dataframe with mixed types
I am working with a data frame that has mixed data types (numeric and character) and also has a character key as the primary identifier. I'd like to scale and center the numeric variables. I've tried using the scale() function, but it requires all fields to be numeric. When I take just the numeric fields and scale them, I have to drop the character identifier to be able to scale them.
My ideal end state is that I have a data frame with character fields and scaled numeric fields.
I realize this is a newbie question, so please be gentle ;-)
Thanks!
Jim
Solution 1:[1]
Something like this should do what you want:
library(MASS)
ind <- sapply(anorexia, is.numeric)
anorexia[ind] <- lapply(anorexia[ind], scale)
Solution 2:[2]
This can be done straightforwardly using dplyr::mutate_if:
library(dplyr)
iris %>%
mutate_if(is.numeric, scale)
Solution 3:[3]
This code below does not need any external library:
# Scale all numeric columns in a data frame.
# df is your data frame
performScaling <- TRUE # Turn it on/off for experimentation.
if (performScaling) {
# Loop over each column.
for (colName in names(df)) {
# Check if the column contains numeric data.
if(class(df[,colName]) == 'integer' | class(df[,colName]) == 'numeric') {
# Scale this column (scale() function applies z-scaling).
df[,colName] <- scale(df[,colName])
}
}
}
Solution 4:[4]
Really the same thing as proposed by Marius, except mutate_if has been superceded with across:
library(dplyr)
iris %>%
mutate(across(where(is.numeric), scale))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Hong Ooi |
| Solution 2 | Marius |
| Solution 3 | stackoverflowuser2010 |
| Solution 4 | Denis Kazakov |
