'How to enumerate each time there is a change in value of a dataframe column?
I have a Dataframe Below DFSurv, I would like to create the column Event which :
if TF[i]==TF[i-1] then Event[i] = Event[i-1]
else Event[i] = Event[i-1] + 1
this should be done for each group prov, and the lag is computed ordering by column Per.
The main idea is to add one whenever TF values are changing.
set.seed(1)
DFSurv = data.frame(Per = c(1:10,1:10,1:10, 1:10),
prov = c(rep("A",10),rep("B",10), rep("D",10),rep("F",10)),
TF = sample(0:1,size=40,replace=TRUE))
When using dplyr::lag I am getting wrong results:
DFSurv %>% mutate(Event = 0) %>%
arrange(prov, Per) %>%
group_by(prov) %>%
mutate(Event = if_else(TF == dplyr::lag(TF, default =0),
dplyr::lag(Event, default =0),
dplyr::lag(Event, default =0)+1))
# A tibble: 40 x 4
# Groups: prov [4]
Per prov TF Event
<int> <chr> <int> <dbl>
1 1 A 0 0
2 2 A 1 1
3 3 A 0 1
4 4 A 0 0
5 5 A 1 1
6 6 A 0 1
7 7 A 0 0
8 8 A 0 0
9 9 A 1 1
10 10 A 1 0
# ... with 30 more rows
These results are wrong because Event[3] TF[3] != TF[2] thus the value should be Event[2]+1 which is 2.
This can be done with loops but a vectorized approach is preferred.
Solution 1:[1]
The essence to solve your problem is cumsum.
Note that my set.seed result is different from yours.
library(dplyr)
set.seed(1)
DFSurv = data.frame(Per = c(1:10,1:10,1:10, 1:10),
prov = c(rep("A",10),rep("B",10), rep("D",10),rep("F",10)),
TF = sample(0:1,size=40,replace=TRUE))
DFSurv %>%
group_by(prov) %>%
mutate(Event = cumsum(abs(c(0, diff(TF)))))
#> # A tibble: 40 × 4
#> # Groups: prov [4]
#> Per prov TF Event
#> <int> <chr> <int> <dbl>
#> 1 1 A 0 0
#> 2 2 A 1 1
#> 3 3 A 0 2
#> 4 4 A 0 2
#> 5 5 A 1 3
#> 6 6 A 0 4
#> 7 7 A 0 4
#> 8 8 A 0 4
#> 9 9 A 1 5
#> 10 10 A 1 5
#> # … with 30 more rows
Created on 2022-05-01 by the reprex package (v2.0.1)
Solution 2:[2]
Try this:
library(tidyverse)
set.seed(1)
DFSurv <- data.frame(
Per = c(1:10, 1:10, 1:10, 1:10),
prov = c(rep("A", 10), rep("B", 10), rep("D", 10), rep("F", 10)),
TF = sample(0:1, size = 40, replace = TRUE)
)
DFSurv %>%
arrange(prov, Per) %>%
group_by(prov) %>%
mutate(event = if_else(TF != lag(TF) & !is.na(lag(TF)), 1, 0),
event_cum = cumsum(event))
#> # A tibble: 40 × 5
#> # Groups: prov [4]
#> Per prov TF event event_cum
#> <int> <chr> <int> <dbl> <dbl>
#> 1 1 A 0 0 0
#> 2 2 A 1 1 1
#> 3 3 A 0 1 2
#> 4 4 A 0 0 2
#> 5 5 A 1 1 3
#> 6 6 A 0 1 4
#> 7 7 A 0 0 4
#> 8 8 A 0 0 4
#> 9 9 A 1 1 5
#> 10 10 A 1 0 5
#> # … with 30 more rows
Created on 2022-05-01 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
