'R data.table dcast() function adds garbage after decimal point?
After a long struggle with my code I think I found a strange behavior of dcast() function in data.table library. Can anyone confirm it, or am I doing something wrong?
For the sake of example:
tt <- data.table(a=runif(n=300,min=0,max=1000000),
b=rep(paste("d",1:3,sep="",collapse=NULL),each=100),
c=rep(LETTERS[1:3],each=100))
t2 <- dcast(tt, c~b, fun.aggregate=sum, value.var = "a")
t2
# c d1 d2 d3
# 1: A 2531364379 0 0
# 2: B 0 2527589493 0
# 3: C 0 0 2532147262
Now, I would assume that numbers in t2 are exactly the same as in tt. But they are not, since some garbage appears after decimal point. For example, in the third column:
t2$d3[3]-round(t2$d3[3],0)
# [1] 0.3269196
Solution 1:[1]
Use options(digits=22) (or some somewhat high number). This has nothing to do with how the number is stored, just how it is represented on the console.
A reproducible example:
set.seed(42)
tt <- data.table(a=runif(n=300,min=0,max=1000000),
b=rep(paste("d",1:3,sep="",collapse=NULL),each=100),
c=rep(LETTERS[1:3],each=100))
t2 <- dcast(tt, c~b, fun.aggregate=sum, value.var = "a")
t2
# c d1 d2 d3
# <char> <num> <num> <num>
# 1: A 52447875 0 0
# 2: B 0 51995321 0
# 3: C 0 0 44077214
t2$d3[3]-round(t2$d3[3],0)
# [1] 0.4191433
The better see the digits:
options(digits=22)
t2
# c d1 d2 d3
# <char> <num> <num> <num>
# 1: A 52447874.720674008 0.000000000 0.000000000
# 2: B 0.000000000 51995320.511283353 0.000000000
# 3: C 0.000000000 0.000000000 44077214.419143274
However, there is no problem with the underlying numbers. Regardless of the value of digits, it is still there.
The difference between what a number is versus how it is printed can be demonstrated thusly:
options(digits=1)
pi
# [1] 3
options(digits=22)
pi
# [1] 3.1415926535897931
At no point did the real value of pi change, just how it is shown on the console.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r2evans |
