'Count multi-response answers aginst a vector in R

I have a multi-response question from a survey.

The data look like this:

|respondent| friend          |
|----------|-----------------|
| 001      | John, Mary      |
|002       | Sue, John, Peter|

Then, I want to count, for each respondent, how many male and female friends they have. I imagine I need to create separate vectors of male and female names, then check each cell in the friend column against these vectors and count.

Any help is appreciated.



Solution 1:[1]

This should be heavily caveated, because many common names are frequently used by different genders. Here I use the genders applied in american social security data in the babynames package as a proxy. Then I merge that with my data and come up with a weighted count based on likelihood. In the dataset, fairly common names including Casey, Riley, Jessie, Jackie, Peyton, Jaime, Kerry, and Quinn are almost evenly split between genders, so in my approach those add about half a female friend and half a male friend, which seems to me the most sensible approach when the name alone doesn't add much information about gender.

library(tidyverse) # using dplyr, tidyr
gender_freq <- babynames::babynames %>%
  filter(year >= 1930) %>% # limiting to people <= 92 y.o.
  count(name, sex, wt = n) %>%
  group_by(name) %>%
  mutate(share = n / sum(n)) %>%
  ungroup()

tribble(
  ~respondent, ~friend,
  "001", "John, Mary, Riley",
  "002", "Sue, John, Peter") %>%
  separate_rows(friend, sep = ", ") %>%
  left_join(gender_freq, by = c("friend" = "name")) %>%
  count(respondent, sex, wt = share)


## A tibble: 4 x 3
#  respondent sex       n
#  <chr>      <chr> <dbl>
#1 001        F      1.53
#2 001        M      1.47
#3 002        F      1.00
#4 002        M      2.00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1