'How to conditionally populate a group of columns in R?
I have a dataframe like this:
date <- c('2020-1', '2020-1','2020-1','2020-2','2020-2','2020-2','2020-2','2020-3','2020-3','2020-3','2020-3')
ID1 <- c('A', 'A', 'B', 'A', 'A', 'A', 'A', 'C', 'A', 'C', 'C')
ID2 <- c('a', 'a', 'a', 'b', 'c', 'c', 'b', 'c', 'd', 'd', 'd')
ID3 <- c('xy', 'xz', 'ab', 'ac', 'ac', 'ac', 'zy', 'yz', 'yy', 'zz', 'cc')
value <- c(100, 10, 50, 0, 10, 100, 15, 16, 17, 100, 50)
df1 <- data.frame(date, ID1, ID2, ID3, value)
I need to form columns based on unique full ID = ID1_ID2_ID3. This is what I did:
df2 <- df1 %>% unite(full_id, c("ID1", "ID2", "ID3"), sep = "_", remove = FALSE)
full_id.unique <- df2$full_id %>% unique()
df2[,full_id.unique]<-NA
Now I'd like to populate these columns with the following logic. Each row of each new column should contain the sum of the values from the "value" column, which are taken from those of its rows for which the date, ID1 and ID2 match the same values from the current row and do not match ID3. For example, the first cell of the A_a_xy column will be equal to 10, because the second line matches the first one by date, ID1, ID2 and does not match in ID3, the rest are 0, and so on. The final result (for the first four added columns) will look like this:
A_a_xy <- c(10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
A_a_xz <- c(0,100,0,0,0,0,0,0,0,0,0)
B_a_ab <- c(0,0,0,0,0,0,0,0,0,0,0)
A_b_ac <- c(0,0,0,15,0,0,0,0,0,0,0)
This is a rather complicated logic for me, I do not know how to approach this task.
Solution 1:[1]
We may group by 'date', 'ID1', 'ID2', loop across columns from 'A_a_xy' to 'C_d_cc', if the full_id value is equal to the 'cur_column(), get the sum` of 'value' and subtract the 'value' where the column name matches the 'ID3'
library(dplyr)
df2 %>%
group_by(date, ID1, ID2) %>%
mutate(across(A_a_xy:C_d_cc,
~ case_when(full_id == cur_column()
~sum(value, na.rm = TRUE) -
sum(value[trimws(cur_column(), whitespace = ".*_") == ID3]), TRUE ~ 0))) %>%
ungroup
-output
# A tibble: 11 × 16
date full_id ID1 ID2 ID3 value A_a_xy A_a_xz B_a_ab A_b_ac A_c_ac A_b_zy C_c_yz A_d_yy C_d_zz C_d_cc
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020-1 A_a_xy A a xy 100 10 0 0 0 0 0 0 0 0 0
2 2020-1 A_a_xz A a xz 10 0 100 0 0 0 0 0 0 0 0
3 2020-1 B_a_ab B a ab 50 0 0 0 0 0 0 0 0 0 0
4 2020-2 A_b_ac A b ac 0 0 0 0 15 0 0 0 0 0 0
5 2020-2 A_c_ac A c ac 10 0 0 0 0 0 0 0 0 0 0
6 2020-2 A_c_ac A c ac 100 0 0 0 0 0 0 0 0 0 0
7 2020-2 A_b_zy A b zy 15 0 0 0 0 0 0 0 0 0 0
8 2020-3 C_c_yz C c yz 16 0 0 0 0 0 0 0 0 0 0
9 2020-3 A_d_yy A d yy 17 0 0 0 0 0 0 0 0 0 0
10 2020-3 C_d_zz C d zz 100 0 0 0 0 0 0 0 0 50 0
11 2020-3 C_d_cc C d cc 50 0 0 0 0 0 0 0 0 0 100
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
