'Check dataframe with different functions (dplyr)
I am trying to write some functions that take a dataframe and check whether certain variables fulfill certain criteria. For each check I would like to create a new variable "check_" giving the result of the check. Unfortunately, I still struggle to get it right. Can someone help me?
# Some sample data
dat <- data.frame(Q1_1 = c(1, 1, 2, 5, 2, 1),
Q1_2 = c(1, 2, 3, 5, 1, 3),
Q1_3 = c(4, 3, 3, 5, 1, 3),
Q1_4 = c(4, 2, 2, 5, 1, 2),
Q1_5 = c(2, 2, 1, 5, 5, 4),
Q2_1 = c(1, 2, 1, 2, 1, 2),
Q2_2 = c(2, 1, 1, 1, 2, 1),
Q2_3 = c(1, 1, 1, 2, 2, 1),
age = c(22,36,20,27,13, 9))
# Some checker-functions
check_age <- function(.df, agevar = "age"){
#' Function should check if the age value is within a certain range
#' and create a new variable "check_age" giving the result of the check
.df %>% mutate(check_age = ifelse(age > 100, FALSE, TRUE),
check_age = ifelse(age < 4, FALSE, TRUE))
???
}
check_sameAnswers <- function(.df, varname = "Q1_"){
#' Function should check whether all sub Of a question (e.g. Q1_1 to Q1_5) have the
#' same values and create a new variable "check_sameAnswers" giving the result of the check.
#' It should be TRUE if Q1_1, Q1_2, ... have the value 5 for example, otherwise FALSE
???
}
# Apply checker functions to dataframe in "dplyr-style"
dat <- dat %>%
check_age(agevar = "age") %>%
check_sameAnswers(varname = "Q1_")
Solution 1:[1]
You can embrace the argument to use variables (from data masking) in your function
Functions
library(dplyr)
check_age <- function(data, age_var, start = 0, end = 0){
data %>%
mutate(between = ifelse({{age_var}} >= start & {{age_var}} <= end,T,F))
}
check_sameAnswers <- function(data, cols){
data %>%
rowwise() %>%
mutate(same = length(unique(c_across(starts_with(cols)))) == 1) %>%
ungroup()
}
Use
dat %>%
check_age(age, 30, 40) %>%
check_sameAnswers(cols="Q1")
# A tibble: 6 × 11
Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q2_1 Q2_2 Q2_3 age between same
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
1 1 1 4 4 2 1 2 1 22 FALSE FALSE
2 1 2 3 2 2 2 1 1 36 TRUE FALSE
3 2 3 3 2 1 1 1 1 20 FALSE FALSE
4 5 5 5 5 5 2 1 2 27 FALSE TRUE
5 2 1 1 1 5 1 2 2 13 FALSE FALSE
6 1 3 3 2 4 2 1 1 9 FALSE FALSE
Solution 2:[2]
I think the problem is in your ifelse statement. Try this:
check_age <- function(.df, agevar = "age"){
#' Function should check if the age value is within a certain range
#' and create a new variable "check_age" giving the result of the check
.df %>% mutate(check_age = ifelse(age > 100 | age < 4, FALSE, TRUE))
}
check_sameAnswers <- function(.df, varname = "Q1_"){
#' Function should check whether all sub Of a question (e.g. Q1_1 to Q1_5) have the
#' same values and create a new variable "check_sameAnswers" giving the result of the check.
#' It should be TRUE if Q1_1, Q1_2, ... have the value 5 for example, otherwise FALSE
.df %>% mutate(sameAnswers = ifelse(length(unique(dat$Q1_2)) == 1, TRUE, FALSE))
}
# Apply checker functions to dataframe in "dplyr-style"
dat <- dat %>%
check_age(agevar = "age") %>%
check_sameAnswers(varname = "Q1_")
dat
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andre Wildberg |
| Solution 2 |
