Category "data.table"

Flagging continuous observations and creating enrolment spans

I have a few large enrolment datasets and I'm trying to create two things: I'd like to flag each uninterrupted monthly observation (final_df1) I'd like to creat

How to do sum and filter in rbind function

I am using an R package which extracts data from tables in a database based on the flag for each table. If the flag is 1, extract data from that table. If the f

Extracting rows based on more than two partial strings that must all be part of the string

I want to extract rows that must contain two or more partial strings. For example, suppose I have the following data.table df <- data.table(player = c('A', '

Filling a column based on the value of another column in data.table

I have data as follows: dat <- structure(list(amount_of_categories = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), mun

use 'start' and 'end' values in two columns to specify fill range over remaining columns in R

I need to fill each row of a matrix with '1' between 'start' and 'end' columns, where the 'start' and 'end' column names (dates in the real data) are specified

use 'start' and 'end' values in two columns to specify fill range over remaining columns in R

I need to fill each row of a matrix with '1' between 'start' and 'end' columns, where the 'start' and 'end' column names (dates in the real data) are specified

Using R to Calculate the time since binary output=1

I have binary data in a dataframe with a time feature and I'm looking to produce a dataframe like below with a new column "duration since =1". I was able to fi

dplyr: Replace multiple values based on condition in a selection of columns

I try to conditionally replace multiple values in a data frame. In the following data set, I want to replace in columns 3:5 all values of 2 by "X" and all value

Using setDT inside a function

I'm writing a function that, among other things, coerces the input into a data.table. library(data.table) df <- data.frame(id = 1:10) f <- function(df){

data.table foverlaps with as.yearmon wrong result

I am trying to use foverlaps from the data.table package. When I use it together with as.yearmon I get wrong results. Here is a small example: library(data.tabl

R data.table struggling with conditional subsetting when column name is predefined elsewhere

Let's say I have a data table library(data.table) DT <- data.table(x=c(1,1,0,0),y=c(0,1,2,3)) column_name <- "x" x y 1: 1 0 2: 1 1 3: 0 2 4: 0 3 And

Making a while loop more efficient for use on a large data.table to delete rows based on certain conditions

I have a pretty big amount of data in a data table. I would like to delete a number of rows if there is a certain value in a cell. Below is an excerpt from my d

R - Data.table - Using variable column names in RHS operations

How do I use variable column names on the RHS of := operations? For example, given this data.table "dt", I'd like to create two new columns, "first_y" and "firs

Combination of all pairs of rows using R

Here is my dataset: data <- read.table(header = TRUE, text = " group index group_index x y z a 1 a1 12 13 14 a 2 a2

data.table join with date

hello im trying to extract some id with a group and Date in range > d1 id group Date 1: 1 A 2017-07-02 2: 2 A 2017-07-04 3: 3 A

fread does not read character vector

I am trying to download a list using R with the following code: name <- paste0("https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx") master

How to map values from a data.table to a data.table (R)

I have two map/data.tables. One consists of key-values and another one just of some keys. I want to map the values from the first map to the keys of the second.

Taking only the maximum values of duplicate IDs for all columns of a data frame in R

I have data frame of 24525 rows and 22 columns. Last column is the ID column, other are numeric. Number of unique IDs is 18414 and some IDs are repeated more th

dcast warning: ‘Aggregation function missing: defaulting to length’

My df looks like this: Id Task Type Freq 3 1 A 2 3 1 B 3 3 2 A 3 3 2 B 0 4 1 A 3 4 1

Trying to be benchmark dplyr vs data.table

Why does this code not work? How can I benchmark these to expressions? library(data.table) library(dplyr) dt <- as.data.table(mtcars) (lb <- bench::mar