'what is the meaning of these data.table expressions? [closed]
Could someone tell me what these two data.table expressions do?
dt[,R_Fuel:=c(0, diff(dt[, Fuel]))]
dt[R_Fuel < 0 | R_Fuel > 5, R_Fuel:=NA]
Solution 1:[1]
That is the data.table parlance/dialect.
In
data.table, assignments should be done inside of the[brackets, and instead of the typical R assignment operators<-/=, one needs to use:=. Your first line is equivalent todt$R_Fuel <- c(0, diff(dt$Fuel))However, even this is not "good" data.table code, the use of
dt[,Fuel]is unnecessary, it should be justdt[, R_Fuel := c(0, diff(Fuel))]If you're curious about what the R code itself is doing,
diff(.)returns the differences between values of a vector. Because it is the diffs, if done on a vector of lengthn, the return value is lengthn - 1. Sincedata.frames (anddata.tables) require that all columns have the same number of elements, the diffs need to have one value padded; in this case, pre-padded with0.Similar to base-R, when using
[i,j]-notation, theiis a row-selector. Unlike base R, though, whenjincludes an assignment (as both of your expressions do), then thei-component does not subset the data in the return, it just changes which rows get the calculation. The second expression is similar to any of the following (R-basic and data.table-canonical versions, generally equivalent):## basic R dt$$R_Fuel <- ifelse(dt$R_Fuel < 0 | dt$R_Fuel > 5, NA, dt$R_Fuel) ## canonical data.table dt[, R_Fuel := ifelse(R_Fuel < 0 | R_Fuel > 5, NA, R_Fuel)] ## canonical data.table using the preferred `fifelse` dt[, R_Fuel := fifelse(R_Fuel < 0 | R_Fuel > 5, NA_real_, R_Fuel)]FYI, it might be more readable to use
betweenhere:dt[ !between(R_Fuel, 0, 5), R_Fuel := NA ]
Solution 2:[2]
The first expression creates a new column (R_Fuel) in the data.table dt, which holds the row-over-row change (see ?diff) in the values of Fuel in dt. Since there is no value for the first row, 0 is appended to the set of differences. It would be better to write dt[,R_Fuel:=c(0,diff(Fuel))]
The second line, then replaces the new column R_Fuel to NA in all rows where R_Fuel is less than 0 or greater than 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r2evans |
| Solution 2 | langtang |
