'Automating conditional logic for database data checks in R
I am trying to do a large data check for a database. Some fields in the database are hidden, so when I am doing the datacheck, I need to ignore all hidden fields. Fields are hidden based on conditional logic stored in the database. I have exported this conditional logic and have stored it in a dataframe in R. Now I need to automate the data check by somehow using the text string of a conditional argument to automate the script writing itself, which I do not think is possible, or finding a way around this problem.
Below is example code that I need to solve:
id <- c(1001, 1002, 1003, 1004, 1005, 1001, 1002, 1003, 1004, 1005)
target_var <- c("race","race","race","race","race", "race_other",
"race_other", "race_other", "race_other", "race_other")
value <- c(1, NA, 1, 1, 6, NA, NA, NA, NA, "Asian")
branching_logic <- c(NA, NA, NA, NA, NA,
"race == 6", "race == 6", "race == 6",
"race == 6", "race == 6")
race <- c(NA, NA, NA,NA, NA, 1, 1, 1, 6, 6)
data <- data.frame(id, var, value, branching_logic, race) %>%
mutate(data_check_result = case_when(
!is.na(value) ~ "No Missing Data",
is.na(value) & is.na(branching_logic) ~ "Missing Data 1",
is.na(value) & race == 6 ~ "Missing Data 2",
is.na(value) & race != 6 ~ "Hidden field",
))
It would be great if I could replace (race==6) with a variable or somehow directing the script to the conditional expression already saved as a string, but I know that R can't do that.
The above problem has four categories which the data could fall into:
- No Missing Data: only if value is non-na
- Missing Data 1: if the value is NA, and there is no branching logic that hid the variable.
- Missing Data 2: if the value is NA and the branching logic is met to show the field
- Hidden Field: if the value is NA and the branching logic is NOT net to show the field
I have thousands of fields to check with accompanying branching logic, so I need a way to use the branching logic saved in the "branching_logic" column within the script.
IMPORTANT NOTE: The case here is the simplest case. Many target_var variables and value variables have branching logic that looks at multiple other variables to determine whether to hide the field (Ex. race==6 & race==1)
This is only my second time posting, and I usually do not see such in depth problems here, but it would be great if someone has an idea!
Solution 1:[1]
You can store the expression
you want to eval
uate as a string
if you pass it into parse()
first as explained in this answer.
Here's a simple example of how you can store the expression
in a column and then feed it to dplyr::case_when()
.
library(tidyverse)
set.seed(1)
d <- tibble(
a = sample(10),
b = sample(10),
c = "a > b"
)
d %>%
mutate(a_bigger = case_when(
eval(parse(text = c)) ~ "Y",
TRUE ~ "N"
))
#> # A tibble: 10 x 4
#> a b c a_bigger
#> <int> <int> <chr> <chr>
#> 1 9 3 a > b Y
#> 2 4 1 a > b Y
#> 3 7 5 a > b Y
#> 4 1 8 a > b N
#> 5 2 2 a > b N
#> 6 5 6 a > b N
#> 7 3 10 a > b N
#> 8 10 9 a > b Y
#> 9 6 4 a > b Y
#> 10 8 7 a > b Y
Created on 2022-03-07 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |