'R categorize row based on dummy variables

I have a data set of ATM IDs that are coded with dummy variables that tells us whether the ATM is open or closed. The goal is to generate a new column (type) that categorizes each ATM based on its opening/closure behavior. In this data, 1 in the dummy variable tells us that the ATM is open, and 0 tells us that the ATMs is closed. Here is the data and expected output.

data <- tribble(
  ~atm_id, ~nov_2019,  ~feb_2020, ~may_2020, ~aug_2020, ~nov_2020, ~type,
  "xx1", 0,  0, 0, 0, 0,  "A", 
  "xx2", 0,  1, 1, 1, 1,  "B",
  "xx3", 0, 0, 1, 1, 1, "B", 
  "xx4", 0, 0, 0, 1, 1, "B",
  "xx5", 0, 1, 0, 1, 1, "C",
  "xx6", 0, 1, 0, 1, 0, "C"
)

I am trying to mutate the type variable and categorize each type of opening/closure behavior.

  • Type A - ATMs that closed in the first time period, and remained closed (all zeros)
  • Type B - ATMs that closed in the first time period, eventually reopened, and have stayed open so far.
  • Type C - ATMs that closed in the first time period, eventually reopened, and then closed again after reopening - i.e., (0, 1, 0, 1)

The month/year columns go up to 2022, and we will be adding more data later on, so the ideal the code is flexible to accommodate. However, these three are the basic types of opening/closure behaviors, and I need to capture them somehow using row-wise operations or some other method.



Solution 1:[1]

You can use c_across with case_when.

  • In the first case, type is A if all values are 0 between nov_2019 and nov_2020
  • In the second case, type is B if there are two distinct consecutive values (using data.table::rleid).
  • Otherwise, type is C. This could be replaced by something like n_distinct(data.table::rleid(c_across(nov_2019:nov_2020))) > 2.
library(dplyr)
data %>% 
  rowwise() %>% 
  mutate(new = case_when(all(c_across(nov_2019:nov_2020) == 0) ~ "A",
                         n_distinct(data.table::rleid(c_across(nov_2019:nov_2020))) == 2 ~ "B",
                         T ~ "C"))

# A tibble: 6 x 8
# Rowwise: 
  atm_id nov_2019 feb_2020 may_2020 aug_2020 nov_2020 type  new  
  <chr>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl> <chr> <chr>
1 xx1           0        0        0        0        0 A     A    
2 xx2           0        1        1        1        1 B     B    
3 xx3           0        0        1        1        1 B     B    
4 xx4           0        0        0        1        1 B     B    
5 xx5           0        1        0        1        1 C     C    
6 xx6           0        1        0        1        0 C     C    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Maël