'Eliminating values in CrossTable in R
I'm just getting started in R and I'm trying to wrap my head around Chi square for a university assignment.
Specifically, I am using the General Social Survey 2018 dataset (for codebook: https://www.thearda.com/Archive/Files/Codebooks/GSS2018_CB.asp), and I am trying to figure out if religion has any effect on the way people seek out help for mental health.
I want to use reliten (self-assessment of religiousness - from strong to no religion) as the independent variable, and mentloth, (asks if a person with mental health issues should reach out to a mental health professional - yes or no) as the dependent variable. Next to the Chi-square, I also want to add CrossTable(GSS18$reliten, GSS18$mentloth), but I'm not sure how to take out the "Not applicable", "Don't know" and "No response" values coded as 0, 8 and 9. Anyone has some tips?
Below there is a short preview of my data, if it helps.
structure(list(reliten = structure(c(1, 1, 4, 1, 1, 2, 1, 1,
4, 2, 2, 3, 2, 2, 4, 1, 4, 3, 2, 1, 2, 1, 2, 2, 1), label = "Would you call yourself a strong [religious preference] or a not very strong [re", format.stata = "%8.0g", labels = c(`Not applicable` = 0,
Strong = 1, `Not very strong` = 2, `Somewhat strong` = 3, `No religion` = 4,
`Don't know` = 8, `No answer` = 9), class = c("haven_labelled",
"vctrs_vctr", "double")), mentloth = structure(c(0, 1, 0, 1,
2, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0
), label = "Should [NAME] go to a therapist, or counselor, like a psychologist, social worke", format.stata = "%8.0g", labels = c(`Not applicable` = 0,
Yes = 1, No = 2, `Don't know` = 8, `No answer` = 9), class = c("haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -25L), class = c("tbl_df",
"tbl", "data.frame"))
Any help would be much appreciated!
Solution 1:[1]
The CrossTable function is from the gmodels package, which doesn't know how to handle objects of class haven_labelled, so treats them as numeric vectors.
To get a nicer output, you can convert them into base R factors for CrossTable to retain the names. Fortunately, the haven package contains the function as_factor for doing exactly that.
Once you have done that, it is easy to drop the factor levels you don't want, as shown below:
library(gmodels)
library(haven)
df <- GSS18[!GSS18$mentloth %in% c(0, 8, 9),]
df$reliten <- as_factor(df$reliten)
df$mentloth <- as_factor(df$mentloth)
df$reliten <- factor(as.character(df$reliten),
levels = c("No religion", "Somewhat strong",
"Not very strong", "Strong"))
So now you can do
CrossTable(df$reliten, df$mentloth)
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 12
| df$mentloth
df$reliten | Yes | No | Row Total |
----------------|-----------|-----------|-----------|
No religion | 1 | 0 | 1 |
| 0.008 | 0.083 | |
| 1.000 | 0.000 | 0.083 |
| 0.091 | 0.000 | |
| 0.083 | 0.000 | |
----------------|-----------|-----------|-----------|
Somewhat strong | 1 | 0 | 1 |
| 0.008 | 0.083 | |
| 1.000 | 0.000 | 0.083 |
| 0.091 | 0.000 | |
| 0.083 | 0.000 | |
----------------|-----------|-----------|-----------|
Not very strong | 3 | 0 | 3 |
| 0.023 | 0.250 | |
| 1.000 | 0.000 | 0.250 |
| 0.273 | 0.000 | |
| 0.250 | 0.000 | |
----------------|-----------|-----------|-----------|
Strong | 6 | 1 | 7 |
| 0.027 | 0.298 | |
| 0.857 | 0.143 | 0.583 |
| 0.545 | 1.000 | |
| 0.500 | 0.083 | |
----------------|-----------|-----------|-----------|
Column Total | 11 | 1 | 12 |
| 0.917 | 0.083 | |
----------------|-----------|-----------|-----------|
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
