'R - Scaling numeric values only in a dataframe with mixed types

I am working with a data frame that has mixed data types (numeric and character) and also has a character key as the primary identifier. I'd like to scale and center the numeric variables. I've tried using the scale() function, but it requires all fields to be numeric. When I take just the numeric fields and scale them, I have to drop the character identifier to be able to scale them.

My ideal end state is that I have a data frame with character fields and scaled numeric fields.

I realize this is a newbie question, so please be gentle ;-)

Thanks!

Jim

r


Solution 1:[1]

Something like this should do what you want:

library(MASS)
ind <- sapply(anorexia, is.numeric)
anorexia[ind] <- lapply(anorexia[ind], scale)

Solution 2:[2]

This can be done straightforwardly using dplyr::mutate_if:

library(dplyr)

iris %>%
    mutate_if(is.numeric, scale)

Solution 3:[3]

This code below does not need any external library:

# Scale all numeric columns in a data frame.
# df is your data frame

performScaling <- TRUE  # Turn it on/off for experimentation.

if (performScaling) {

    # Loop over each column.
    for (colName in names(df)) {

        # Check if the column contains numeric data.
        if(class(df[,colName]) == 'integer' | class(df[,colName]) == 'numeric') {

            # Scale this column (scale() function applies z-scaling).
            df[,colName] <- scale(df[,colName])
        }
    }
}

Solution 4:[4]

Really the same thing as proposed by Marius, except mutate_if has been superceded with across:

library(dplyr)

iris %>%
    mutate(across(where(is.numeric), scale))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hong Ooi
Solution 2 Marius
Solution 3 stackoverflowuser2010
Solution 4 Denis Kazakov