'How to handle categorical data in R for regression?
I am new to R and want to know how do I deal with categorical data. This is my data where x1,x5,x6,x7 are categorical:
y = c(1, 0, 1, 1, 1, 1, 1, 1 ,1 ,1 ,1, 1, 1 ,1 ,1 ,1, 0 ,1, 1, 1 ,0, 1 ,1, 1 ,1, 1, 1, 0, 0, 0 ,1 ,1, 0, 0)
x1 = c(0 ,1 ,4 ,1, 4, 1, 5 ,4 ,1 ,3 ,1 ,0 ,1 ,4 ,1 ,4, 1, 1, 0, 1, 1, 4, 1, 1, 1, 3, 1, 2, 1, 1, 2, 3, 4, 0)
x2 = c(2, 13, 7, 8, 12, 4, 2, 3, 7, 20, 8, 6, 8, 5, 2.5, 20, 3, 12, 8, 9, 9, 7, 6, 30, 8, 4, 13, 12, 14, 11, 18, 9, 5, 10)
x3 = c(8, 4, 7, 8, 5, 7, 12, 14, 9, 7, 8, 6, 4, 4, 11, 9, 5, 5, 5, 6, 8, 7, 5, 10, 6, 12, 4, 7, 3, 5, 4, 6, 6, 9)
x4 = c(25, 17, 16, 16, 17, 16, 16, 17, 16, 34, 16, 17, 16, 17, 16, 60, 13, 17, 31, 16, 17, 17, 16, 42, 16, 16, 17, 19, 17, 16, 25, 18, 22, 15)
x5 = c(0, 0, 0, 8, 0, 5, 0, 5, 6, 0, 5, 0, 8, 0, 0, 7, 0, 0, 0, 7, 8, 0, 5, 0, 5, 5, 5, 0, 0, 5, 7, 8, 3, 2)
x6 = c(0, 1, 1, 1, 0, 0, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0)
x7 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 3, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 3, 0, 0, 2, 1)
I thought model1 would be correct because i have already made dummy variables so i shouldn't need factors. But then this doesn't let me see which levels are insignificant, it just tells me the variable x1 is insignificant. But if i want to see if x1(0), x1(1), x1(2) is insignificant in particular, what should i do?
model1 = glm(y~x1+x2+x3+x4+x5+x6+x7)
model2 = glm(y~factor(x1)+ x2 + x3 + x4 factor(x5) + factor(x6)+factor(x7)))
fx1 = factor(x1)
fx5 = factor(x5)
fx6 = factor(x6)
fx7 = factor(x7)
model3 = glm(y~fx1+x2+x3+x4+fx5+fx6+fx7)
Solution 1:[1]
ANOVA can be uses to test if a particular variable is significant:
y = c(1, 0, 1, 1, 1, 1, 1, 1 ,1 ,1 ,1, 1, 1 ,1 ,1 ,1, 0 ,1, 1, 1 ,0, 1 ,1, 1 ,1, 1, 1, 0, 0, 0 ,1 ,1, 0, 0)
x1 = c(0 ,1 ,4 ,1, 4, 1, 5 ,4 ,1 ,3 ,1 ,0 ,1 ,4 ,1 ,4, 1, 1, 0, 1, 1, 4, 1, 1, 1, 3, 1, 2, 1, 1, 2, 3, 4, 0)
x2 = c(2, 13, 7, 8, 12, 4, 2, 3, 7, 20, 8, 6, 8, 5, 2.5, 20, 3, 12, 8, 9, 9, 7, 6, 30, 8, 4, 13, 12, 14, 11, 18, 9, 5, 10)
x3 = c(8, 4, 7, 8, 5, 7, 12, 14, 9, 7, 8, 6, 4, 4, 11, 9, 5, 5, 5, 6, 8, 7, 5, 10, 6, 12, 4, 7, 3, 5, 4, 6, 6, 9)
x4 = c(25, 17, 16, 16, 17, 16, 16, 17, 16, 34, 16, 17, 16, 17, 16, 60, 13, 17, 31, 16, 17, 17, 16, 42, 16, 16, 17, 19, 17, 16, 25, 18, 22, 15)
x5 = c(0, 0, 0, 8, 0, 5, 0, 5, 6, 0, 5, 0, 8, 0, 0, 7, 0, 0, 0, 7, 8, 0, 5, 0, 5, 5, 5, 0, 0, 5, 7, 8, 3, 2)
x6 = c(0, 1, 1, 1, 0, 0, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0)
x7 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 3, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 3, 0, 0, 2, 1)
# treat the numbers as nominal
fx1 = factor(x1)
fx5 = factor(x5)
fx6 = factor(x6)
fx7 = factor(x7)
model <- lm(y~fx1+x2+x3+x4+fx5+fx6+fx7)
anova(model)
#> Analysis of Variance Table
#>
#> Response: y
#> Df Sum Sq Mean Sq F value Pr(>F)
#> fx1 5 0.48109 0.09622 0.7360 0.60792
#> x2 1 0.00254 0.00254 0.0195 0.89092
#> x3 1 0.12262 0.12262 0.9379 0.34816
#> x4 1 0.19583 0.19583 1.4980 0.23985
#> fx5 6 2.02016 0.33669 2.5755 0.06424 .
#> fx6 1 0.08942 0.08942 0.6840 0.42117
#> fx7 3 1.24507 0.41502 3.1747 0.05496 .
#> Residuals 15 1.96091 0.13073
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2022-03-18 by the reprex package (v2.0.0)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | danlooo |
