'Nice way to group data in a `data.table` when the new column name is given as a character vector
In other words, my question is about the j argument to data.table when the name of the new column is a character vector. For example:
dt <- data.table(x = c(1, 1, 2, 2, 3, 3), y = rnorm(6))
agg_col_name <- 'avg'
grouped_dt <- dt[, .(z = mean(y)), by = x]
setnames(grouped_dt, 'z', agg_col_name)
> grouped_dt
x avg
1: 1 -0.2554987
2: 2 -0.4245852
3: 3 -0.4881073
There should be a more elegant way to do the last two statements as one, yes?
Perhaps this is a question about how to create suitable list for the j argument.
Solution 1:[1]
Although probably not what you are looking for, but you could use setNames inside, where it wraps around (.(z = mean(y)).
library(data.table)
dt[, setNames(.(z = mean(y)), agg_col_name), by = x]
Or use setnames after doing the summary:
setnames(dt[, mean(y), by = x], 'V1', agg_col_name)[]
Output
x avg
1: 1 0.5626526
2: 2 0.3549653
3: 3 -0.2861405
However, as mentioned in the comments, it is easier to do with the dev version of data.table. You can see more about the development of this feature at [programming on data.table #4304]:(https://github.com/Rdatatable/data.table/pull/4304).
# Latest development version:
data.table::update.dev.pkg()
library(data.table)
dt[, .(z = mean(y)), by = x, env = list(z=agg_col_name)]
# x avg
#1: 1 -0.1640783
#2: 2 0.5375794
#3: 3 0.1539785
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
