'dplyr: Populating a column based on some condition by iterating over a vector
I am looking for a tidyverse solution of populating a column based on some condition by iterating over a vector. I would like a tidy solution, as my data is large and nested, unlike the following minimal example.
Column prev represents previous state of a system, new represents the new state. prev and new are always in sequence, but some of them may loop, like in the example below.0 and 100 values are start and finish states, the states represented here as letters are the important ones.
For certain combination of prev and new (i.e. satisfying both prev_condition and new_condition), I need to iterate over a larger vector vec (it has more elements than possible combinations of state) and place the values in order in column to_do
I would like to add that combinations of prev and new do not need to be unique, this is why I added column change to indicate every state change that took place.
Here is a minimal example and a solution in base R. I am really hoping for an elegant tidyverse solution. Thank you.
# Minimal reproducible data
df <- data.frame(prev = c("0", rep(letters[1:3], 2), rep(letters[4:10], 3)),
new = c(rep(letters[1:3], 2), rep(letters[4:10], 3), "100"),
change = 1:28,
to_do = rep(NA, 28))
# Vector for iteration
vec <- 1:30
# Conditions
prev_condition <- c(letters[4:6]) # prev state must be any of: "d" "e" "f"
new_condition <- c(letters[5:7]) # new state must be any of: "e" "f" "g"
# base R solution
n_row <- length(df[df$prev %in% prev_condition & df$new %in% new_condition, "to_do"])
df[df$prev %in% prev_condition & df$new %in% new_condition, "to_do"] <- vec[1:n_row]
df
#> prev new change to_do
#> 1 0 a 1 <NA>
#> 2 a b 2 <NA>
#> 3 b c 3 <NA>
#> 4 c a 4 <NA>
#> 5 a b 5 <NA>
#> 6 b c 6 <NA>
#> 7 c d 7 <NA>
#> 8 d e 8 I
#> 9 e f 9 Y
#> 10 f g 10 N
#> 11 g h 11 <NA>
#> 12 h i 12 <NA>
#> 13 i j 13 <NA>
#> 14 j d 14 <NA>
#> 15 d e 15 W
#> 16 e f 16 Q
#> 17 f g 17 Z
#> 18 g h 18 <NA>
#> 19 h i 19 <NA>
#> 20 i j 20 <NA>
#> 21 j d 21 <NA>
#> 22 d e 22 V
#> 23 e f 23 C
#> 24 f g 24 C
#> 25 g h 25 <NA>
#> 26 h i 26 <NA>
#> 27 i j 27 <NA>
#> 28 j 100 28 <NA>
Created on 2022-05-09 by the reprex package (v2.0.1)
Solution 1:[1]
Using dplyr, we can create the column with replace - create a NA vector and replace with the sequence of count (sum of logical vector) where that condition is TRUE
library(dplyr)
df %>%
mutate(to_do = replace(rep(NA_real_, n()),
prev %in% prev_condition & new %in% new_condition,
seq_len(sum(prev %in% prev_condition & new %in% new_condition))))
-output
prev new change to_do
1 0 a 1 NA
2 a b 2 NA
3 b c 3 NA
4 c a 4 NA
5 a b 5 NA
6 b c 6 NA
7 c d 7 NA
8 d e 8 1
9 e f 9 2
10 f g 10 3
11 g h 11 NA
12 h i 12 NA
13 i j 13 NA
14 j d 14 NA
15 d e 15 4
16 e f 16 5
17 f g 17 6
18 g h 18 NA
19 h i 19 NA
20 i j 20 NA
21 j d 21 NA
22 d e 22 7
23 e f 23 8
24 f g 24 9
25 g h 25 NA
26 h i 26 NA
27 i j 27 NA
28 j 100 28 NA
Solution 2:[2]
Will this work for you:
library(dplyr)
df %>%
mutate(to_do = ifelse(new %in% new_condition &
prev %in% prev_condition, rank(to_do), to_do),
to_do = replace(to_do, !is.na(to_do),
seq_len(sum(!is.na(to_do)))))
prev new change to_do
1 0 a 1 NA
2 a b 2 NA
3 b c 3 NA
4 c a 4 NA
5 a b 5 NA
6 b c 6 NA
7 c d 7 NA
8 d e 8 1
9 e f 9 2
10 f g 10 3
11 g h 11 NA
12 h i 12 NA
13 i j 13 NA
14 j d 14 NA
15 d e 15 4
16 e f 16 5
17 f g 17 6
18 g h 18 NA
19 h i 19 NA
20 i j 20 NA
21 j d 21 NA
22 d e 22 7
23 e f 23 8
24 f g 24 9
25 g h 25 NA
26 h i 26 NA
27 i j 27 NA
28 j 100 28 NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
| Solution 2 | TarJae |
