'How do I pass arguments to srvyr inside of a function?

so I'm using srvyr to calculate survey means of a variable (y) from a survey object, grouping by a categorical variable (x) from that same survey object, and the basic code looks like this

survey_means <- survey_object %>%
 filter( #remove NAs) %>%
 group_by(x) %>%
 summarise(Mean = survey_mean(y)) 

Suppose I want to instead put this block of code inside a function, which accepts the survey object and two variables as parameters. This is a simplified version of what I'm actually trying to do, which is a function that will handle up to a group of 4 or so variables, but this is the base case:

SurveyMeanFunc <- function(survey_object, x, y) {

survey_means <- survey_object %>%
 filter( #remove NAs ) %>%
 group_by(survey_object[["variables"]][[x]]) %>%
 summarise(Mean = survey_mean(survey_object[["variables"]][[y]]))
 
return(survey_means) 

}

When attempting to use this function I will always be presented with an error message along the lines of

! Assigned data `x` must be compatible with existing data.
x Existing data has n rows.
x Assigned data has m rows. (m > n)
i Only vectors of size 1 are recycled.

Even when I split up the pipes, and verify that the number of rows in x are the same as y right before using the summarise command, I still get this message. What is summarise() doing that I don't understand?

[EDIT] Full Context with suggested changes:

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1= NULL, categ2= NULL) {
  
  if (is.null(categ1) & is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
    NULL #fix
    
  }
  
  return(survey_estimate)
  
}

The remaining issue is that using quasiquotation to solve the issue of referencing the survey variables works for the top level of this if-else statement but the function parameters are not recognised inside the next else if block, even though they are treated the same way using {{}}



Solution 1:[1]

You don't give an example of how you want to use the function, but if I'm understanding correctly, you want to take your first block of code and run it with x replaced by the name of the variable passed in as the x argument and y by the name of the variable passed in as the y argument (only with the 'remove NAs' line deleted or fixed to do something)

That is, you want SurveyMeanFunc(my_design, species, height) to be

my_design %>%
 group_by(species) %>%
 summarise(Mean = survey_mean(height)) 

This is complicated because you don't want the value of x or the name x, you want the name species.

One way is quasiquotation, which used to require enquo and !! but now can be done more easily with the {{ }} operator

SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
 group_by({{ x }}) %>%
 summarise(Mean = survey_mean({{ y }}))
 survey_means
}

giving

> dstrata <- apistrat %>%
+   as_survey(strata = stype, weights = pw)
> 
> SurveyMeanFunc(dstrata, stype, api00)
# A tibble: 3 × 3
  stype  Mean Mean_se
  <fct> <dbl>   <dbl>
1 E      674.    12.5
2 H      626.    15.5
3 M      637.    16.6

Update

You still don't give an example of how you want to use the function, but I think this works

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1, categ2) {
  
  if (missing(categ1) & missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
   survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ categ2 }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  }
  
  return(survey_estimate)
  
}

The issue is that you can't evaluate categ1 or categ2 in the if condition if they are supplied by the user, because you're not evaluating them in a survey object. R doesn't know where to look. This is a problem because of the way the tidyverse uses unquoted variable names -- if you supplied them as model formulas (as you would in survey) or as quoted strings you'd be ok.

The missing function asks whether an argument was supplied, which in this case is what you want. There's a more flexible is_missing/maybe_missing setup in the rlang package; you could look at that for another option. But this seems to work

> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide,comp.imp)
# A tibble: 4 × 5
# Groups:   comp.imp [2]
  comp.imp sch.wide  Mean Mean_low Mean_upp
  <fct>    <fct>    <dbl>    <dbl>    <dbl>
1 No       No       1013.     810.    1216.
2 No       Yes       525.     438.     611.
3 Yes      No        370.     207.     533.
4 Yes      Yes       521.     475.     566.
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide)
# A tibble: 6 × 5
# Groups:   stype [3]
  stype sch.wide  Mean Mean_low Mean_upp
  <fct> <fct>    <dbl>    <dbl>    <dbl>
1 E     No        420.     340.     499.
2 E     Yes       417.     381.     452.
3 H     No       1520.    1209.    1830.
4 H     Yes      1137.     946.    1328.
5 M     No        967.     709.    1226.
6 M     Yes       775.     669.     881.
> SurveyMeanMedFunc(dstrata,stype,enroll)
# A tibble: 3 × 4
  stype  Mean Mean_low Mean_upp
  <fct> <dbl>    <dbl>    <dbl>
1 E      417.     384.     450.
2 H     1321.    1134.    1508.
3 M      832.     722.     943.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1