'R update a tibble using data from a second tibble as row and column
I have an empty tibble full of NA's with the rows and columns, respectively named for id numbers and all the dates in a period of time. For example, this code:
tbl <- tibble(PERSONAL_ID = c("A", "B", "C", "D"))
dates = as.character((seq(as.Date("2016-01-01"), as.Date("2016-01-05"), by="days")))
tbl[dates] <- NA
tbl <- column_to_rownames(tbl, var = "PERSONAL_ID")
I have a second tibble that contains columns matching up one ID number with one date, as in this example:
enrollments <- tibble(ID = c("D", "B", "C", "D"),
date = c("2016-01-01", "2016-01-03", "2016-01-05", "2016-01-02"))
What I would like to do is add "1" to the row and column of the first tibble (tbl) corresponding to the ID and date listed in the second tibble (enrollments). For the example code above, the desired output would be:
2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A <NA> <NA> <NA> <NA> <NA>
B <NA> <NA> 1 <NA> <NA>
C <NA> <NA> <NA> <NA> 1
D 1 1 <NA> <NA> <NA>
Thank you!
Solution 1:[1]
Here's a tidyverse approach.
- First change your
tblfrom awideformat to alongformat so that it matches the format ofenrollments. - Create a
Countcolumn inenrollments, and every row would beCount = 1. - Then
left_jointhe transformedtblwithenrollmentsusingIDanddateas the joining field. - Finally, transform the
longformat back to awideformat and setrownames.
library(tidyverse)
left_join(tbl %>% rownames_to_column(var = "ID") %>%
pivot_longer(-ID, names_to = "date", values_to = "Count") %>%
select(-Count),
enrollments %>% mutate(Count = 1),
by = c("ID", "date")) %>%
pivot_wider(names_from = "date", values_from = "Count") %>%
column_to_rownames(var = "ID")
Output
2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A NA NA NA NA NA
B NA NA 1 NA NA
C NA NA NA NA 1
D 1 1 NA NA NA
Your dataset as reference
tbl
2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A NA NA NA NA NA
B NA NA NA NA NA
C NA NA NA NA NA
D NA NA NA NA NA
enrollments
# A tibble: 4 x 2
ID date
<chr> <chr>
1 D 2016-01-01
2 B 2016-01-03
3 C 2016-01-05
4 D 2016-01-02
Solution 2:[2]
Here's a data.table approach, using PERSONAL_IDS, dates, and enrollments
dcast(rbind(
rbindlist(lapply(setdiff(PERSONAL_ID,enrollments$ID), \(x) data.table(ID=x, date=dates)))[,value:=NA],
enrollments[,value:=1]
), ID~date,value.var="value")
Output:
ID 2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
1: A NA NA NA NA NA
2: B NA NA 1 NA NA
3: C NA NA NA NA 1
4: D 1 1 NA NA NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | langtang |
