'How can modify selected columns (given by a string vector) of data frame
I am new to R. I have a dataframe (from data.table called DT_All) and I need to normalize data in all columns except column names given in a string vector.
When I access column by indices it works. When I tried to access data by column names it doesn't work.
Say I want to normalize data in all columns by a column called "Awake" except following listed columns. Their column indices are 1:8 and the last column of dataframe. After exclusion, the rest of the columns are numeric.
# find variable names for normalization. Keep the variables except ones listed below
excldVarNms=setdiff(names(dt_All),c("SubjectID","Date","Day","Weekend","WorkDay","LeisureDay","DayStart","DayStop","Domain"));
#this line works - normalize dataframe by "Awake" time
dt_All[,9:(ncol(dt_All)-1)]=(dt_All[,9:(ncol(dt_All)-1)]/dt_All$Awake)*480
#this expression doesn't work
dt_All[,excldVarNms,with=FALSE]=(dt_All[,excldVarNms,with=FALSE]/dt_All$Awake)*480
Error in
[<-.data.table(*tmp*, , plotVarNms, with = FALSE, value = list( : unused argument (with = FALSE)
#this also fails obviously because data.table thinks excldVarNms is a column name
dt_All[,excldVarNms]=(dt_All[,excldVarNms]/dt_All$Awake)*480
Error in
[.data.table(dt_All, , excldVarNms) : j (the 2nd argument inside [...]) is a single symbol but column name 'excldVarNms' is not found.
Any thoughts on what's happening?
Solution 1:[1]
Sample data:
samp <- setDT(structure(list(group = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), ignore = c(3, 7, 4, 2, 10, 6, 8, 1, 5, 9), V1 = c(3, 10, 2, 1, 8, 4, 7, 5, 9, 6), V2 = c(3, 7, 9, 6, 4, 8, 2, 5, 10, 1), V3 = c(8, 4, 10, 9, 5, 6, 7, 2, 1, 3)), row.names = c(NA, -10L), class = c("data.table", "data.frame")))
samp
# group ignore V1 V2 V3
# <char> <num> <num> <num> <num>
# 1: A 3 3 3 8
# 2: A 7 10 7 4
# 3: A 4 2 9 10
# 4: A 2 1 6 9
# 5: A 10 8 4 5
# 6: A 6 4 8 6
# 7: B 8 7 2 7
# 8: B 1 5 5 2
# 9: B 5 9 10 1
# 10: B 9 6 1 3
Scaling (normalizing) per group with a set of columns. In this method,
samp[, (cols) := lapply(.SD, function(z) as.numeric(scale(z))), by = .(group), .SDcols = cols]
samp
# group ignore V1 V2 V3
# <char> <num> <num> <num> <num>
# 1: A 3 -0.4682929 -1.36694185 0.42257713
# 2: A 7 1.4985373 0.35972154 -1.26773138
# 3: A 4 -0.7492686 1.22305323 1.26773138
# 4: A 2 -1.0302444 -0.07194431 0.84515425
# 5: A 10 0.9365858 -0.93527600 -0.84515425
# 6: A 6 -0.1873172 0.79138739 -0.42257713
# 7: B 8 0.1463850 -0.61858957 1.42587956
# 8: B 1 -1.0246951 0.12371791 -0.47529319
# 9: B 5 1.3174651 1.36089706 -0.85552774
# 10: B 9 -0.4391550 -0.86602540 -0.09505864
(N.B.: I would prefer to have been able to use lapply(.SD, scale), but since scale assigns a class to its return value, data.table gets confused and will error; wrapping that with as.numeric removes the class and it operates as it should. You may not need to do this with your own normalization, depending on how it is implemented.)
It would be nice to be able to use patterns(.) or similar for .SDcols; the only problem with that is that reassignment of those dynamic columns is not as clear (see data.table#4163 for the fix that would work as samp[, names(.SD) := lapply(.SD, function(z) as.numeric(scale(z))), .SDcols = patterns("^V")].)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r2evans |
