'Strange behavior of data.table with 'by' argument?

I just want a function to sum over rows in a data.table overwriting the old values using the by argument. Normally I would expect to get in all rows grouped together with by the same results. I have created 2 examples. The only difference of the first to the second one is the deletion of the leading 3 digits in the take column of the data.table. The first example works as expected, the second shows some unexpected behavior. I would be glad to get any hint of what I'm doing wrong.

R version: 4.0.4

data.table version: 1.14.2

library(data.table)

# my expected function
superpose <- function(DT){
  DT <- copy(DT)
  DT[, value := sum(value), by = take]
}

v1a = c(   55:   59,    33:   37,    54:   56,    32:   34,    58:   60,    36:   38)
v1b = c(25555:25559, 20533:20537, 25554:25556, 20532:20534, 25558:25560, 20536:20538)
all.equal(as.integer(factor(v1a)), as.integer(factor(v1b)))
# [1] TRUE

v2 = 1:22

data1 <- data.table(take = v1a, value = v2) # 1st data - expected behavior
data2 <- data.table(take = v1b, value = v2) # 2nd data - unexpected behavior

res1 <- superpose(data1)
res2 <- superpose(data2)

cbind(res1, res2)
which(res1[, value] != res2[, value])
# [1]  8 11 15 16 19 20 21 22

r data.table

Solution 1:^[1]

There was already an open issue on github relating to this bug in data.table 1.14.3. This has now been fixed in the latest development version, which can be installed using:

update.dev.pkg()

This is a cautionary tale on why only the brave of heart should use development code - and expect issues to arise if you do.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Strange behavior of data.table with 'by' argument?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]