'what is the meaning of these data.table expressions? [closed]

Could someone tell me what these two data.table expressions do?

dt[,R_Fuel:=c(0, diff(dt[, Fuel]))]

dt[R_Fuel < 0 | R_Fuel > 5, R_Fuel:=NA]


Solution 1:[1]

That is the data.table parlance/dialect.

  • In data.table, assignments should be done inside of the [ brackets, and instead of the typical R assignment operators <-/=, one needs to use :=. Your first line is equivalent to

    dt$R_Fuel <- c(0, diff(dt$Fuel))
    

    However, even this is not "good" data.table code, the use of dt[,Fuel] is unnecessary, it should be just

    dt[, R_Fuel := c(0, diff(Fuel))]
    
  • If you're curious about what the R code itself is doing, diff(.) returns the differences between values of a vector. Because it is the diffs, if done on a vector of length n, the return value is length n - 1. Since data.frames (and data.tables) require that all columns have the same number of elements, the diffs need to have one value padded; in this case, pre-padded with 0.

  • Similar to base-R, when using [i,j]-notation, the i is a row-selector. Unlike base R, though, when j includes an assignment (as both of your expressions do), then the i-component does not subset the data in the return, it just changes which rows get the calculation. The second expression is similar to any of the following (R-basic and data.table-canonical versions, generally equivalent):

    ## basic R
    dt$$R_Fuel <- ifelse(dt$R_Fuel < 0 | dt$R_Fuel > 5, NA, dt$R_Fuel)
    ## canonical data.table
    dt[, R_Fuel := ifelse(R_Fuel < 0 | R_Fuel > 5, NA, R_Fuel)]
    ## canonical data.table using the preferred `fifelse`
    dt[, R_Fuel := fifelse(R_Fuel < 0 | R_Fuel > 5, NA_real_, R_Fuel)]
    

    FYI, it might be more readable to use between here:

    dt[ !between(R_Fuel, 0, 5), R_Fuel := NA ]
    

Solution 2:[2]

The first expression creates a new column (R_Fuel) in the data.table dt, which holds the row-over-row change (see ?diff) in the values of Fuel in dt. Since there is no value for the first row, 0 is appended to the set of differences. It would be better to write dt[,R_Fuel:=c(0,diff(Fuel))]

The second line, then replaces the new column R_Fuel to NA in all rows where R_Fuel is less than 0 or greater than 5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 r2evans
Solution 2 langtang