'Is there a R function for conditional values across different columns?
Suppose you have a dataframe that looks something like this:
df <- tibble(PatientID = c(1,2,3,4,5),
Treat1 = c("R", "O", "C", "O", "C"),
Treat2 = c("O", "R", "R", NA, "O"),
Treat3 = c("C", NA, "O", NA, "R"),
Treat4 = c("H", NA, "H", NA, "H"),
Treat5 = c("H", NA, NA, NA, "H"))
Treat 1:Treat5 are different treatments that a patient has had. I'm looking to create a new variable "Chemo" with 1 for yes, 0 for no based on whether a patient has had treatment "C".
I've been using if_else(), but as I have 10 different treatment variables in my actual dataset, and I would like to create such a column per treatment, i wonder if I can do it without writing such long if statements. Is there an easier way to do this?
Solution 1:[1]
Use if_any to loop over the columns that starts_with 'Treat', create a logical vector with %in% - if_any returns TRUE/FALSE if any of the columns selected have 'C' for a particular row, the logical is converted to binary with + (or as.integer)
library(dplyr)
df <- df %>%
mutate(Chemo = +(if_any(starts_with("Treat"), ~ .x %in% "C")))
-output
df
# A tibble: 5 × 7
PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
<dbl> <chr> <chr> <chr> <chr> <chr> <int>
1 1 R O C H H 1
2 2 O R <NA> <NA> <NA> 0
3 3 C R O H <NA> 1
4 4 O <NA> <NA> <NA> <NA> 0
5 5 C O R H H 1
Or using base R with rowSums
df$Chemo <- +(rowSums(df[startsWith(names(df), "Treat")] == "C",
na.rm = TRUE) > 0)
Solution 2:[2]
Another option using str_detect and any to determine if C occurs in any of the Treat columns for each row. The + converts the logical to an integer.
library(tidyverse)
df %>%
rowwise() %>%
mutate(Chemo = +any(str_detect(c_across(starts_with("Treat")), "C"), na.rm = TRUE)) %>%
ungroup
Output
PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
<dbl> <chr> <chr> <chr> <chr> <chr> <int>
1 1 R O C H H 1
2 2 O R NA NA NA 0
3 3 C R O H NA 1
4 4 O NA NA NA NA 0
5 5 C O R H H 1
Solution 3:[3]
An alternative dplyr way:
library(dplyr)
df %>%
mutate(across(starts_with("Treat"), ~case_when(.=="C" ~1,
TRUE ~0), .names = 'new_{col}')) %>%
mutate(Chemo = rowSums(select(., starts_with("new")))) %>%
select(-starts_with("new"))
PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 R O C H H 1
2 2 O R NA NA NA 0
3 3 C R O H NA 1
4 4 O NA NA NA NA 0
5 5 C O R H H 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
| Solution 2 | AndrewGB |
| Solution 3 | TarJae |
