'How to handle categorical data in R for regression?

I am new to R and want to know how do I deal with categorical data. This is my data where x1,x5,x6,x7 are categorical:

y = c(1, 0, 1, 1, 1, 1, 1, 1 ,1 ,1 ,1, 1, 1 ,1 ,1 ,1, 0 ,1, 1, 1 ,0, 1 ,1, 1 ,1, 1, 1, 0, 0, 0 ,1 ,1, 0, 0)
x1 = c(0 ,1 ,4 ,1, 4, 1, 5 ,4 ,1 ,3 ,1 ,0 ,1 ,4 ,1 ,4, 1, 1, 0, 1, 1, 4, 1, 1, 1, 3, 1, 2, 1, 1, 2, 3, 4, 0)
x2 = c(2, 13, 7, 8, 12, 4, 2, 3, 7, 20, 8, 6, 8, 5, 2.5, 20, 3, 12, 8, 9, 9, 7, 6, 30, 8, 4, 13, 12, 14, 11, 18, 9, 5, 10)
x3 = c(8, 4, 7, 8, 5, 7, 12, 14, 9, 7, 8, 6, 4, 4, 11, 9, 5, 5, 5, 6, 8, 7, 5, 10, 6, 12, 4, 7, 3, 5, 4, 6, 6, 9)
x4 = c(25, 17, 16, 16, 17, 16, 16, 17, 16, 34, 16, 17, 16, 17, 16, 60, 13, 17, 31, 16, 17, 17, 16, 42, 16, 16, 17, 19, 17, 16, 25, 18, 22, 15)
x5 = c(0, 0, 0, 8, 0, 5, 0, 5, 6, 0, 5, 0, 8, 0, 0, 7, 0, 0, 0, 7, 8, 0, 5, 0, 5, 5, 5, 0, 0, 5, 7, 8, 3, 2)
x6 = c(0, 1, 1, 1, 0, 0, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0)
x7 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 3, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 3, 0, 0, 2, 1)

I thought model1 would be correct because i have already made dummy variables so i shouldn't need factors. But then this doesn't let me see which levels are insignificant, it just tells me the variable x1 is insignificant. But if i want to see if x1(0), x1(1), x1(2) is insignificant in particular, what should i do?

model1 = glm(y~x1+x2+x3+x4+x5+x6+x7)

model2 = glm(y~factor(x1)+ x2 + x3 + x4 factor(x5) + factor(x6)+factor(x7)))


fx1 = factor(x1) 
fx5 = factor(x5)
fx6 = factor(x6)
fx7 = factor(x7)


model3 =  glm(y~fx1+x2+x3+x4+fx5+fx6+fx7)

r regression

Solution 1:^[1]

ANOVA can be uses to test if a particular variable is significant:

y = c(1, 0, 1, 1, 1, 1, 1, 1 ,1 ,1 ,1, 1, 1 ,1 ,1 ,1, 0 ,1, 1, 1 ,0, 1 ,1, 1 ,1, 1, 1, 0, 0, 0 ,1 ,1, 0, 0)
x1 = c(0 ,1 ,4 ,1, 4, 1, 5 ,4 ,1 ,3 ,1 ,0 ,1 ,4 ,1 ,4, 1, 1, 0, 1, 1, 4, 1, 1, 1, 3, 1, 2, 1, 1, 2, 3, 4, 0)
x2 = c(2, 13, 7, 8, 12, 4, 2, 3, 7, 20, 8, 6, 8, 5, 2.5, 20, 3, 12, 8, 9, 9, 7, 6, 30, 8, 4, 13, 12, 14, 11, 18, 9, 5, 10)
x3 = c(8, 4, 7, 8, 5, 7, 12, 14, 9, 7, 8, 6, 4, 4, 11, 9, 5, 5, 5, 6, 8, 7, 5, 10, 6, 12, 4, 7, 3, 5, 4, 6, 6, 9)
x4 = c(25, 17, 16, 16, 17, 16, 16, 17, 16, 34, 16, 17, 16, 17, 16, 60, 13, 17, 31, 16, 17, 17, 16, 42, 16, 16, 17, 19, 17, 16, 25, 18, 22, 15)
x5 = c(0, 0, 0, 8, 0, 5, 0, 5, 6, 0, 5, 0, 8, 0, 0, 7, 0, 0, 0, 7, 8, 0, 5, 0, 5, 5, 5, 0, 0, 5, 7, 8, 3, 2)
x6 = c(0, 1, 1, 1, 0, 0, 1, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0)
x7 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 3, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 3, 0, 0, 2, 1)

# treat the numbers as nominal
fx1 = factor(x1) 
fx5 = factor(x5)
fx6 = factor(x6)
fx7 = factor(x7)

model <- lm(y~fx1+x2+x3+x4+fx5+fx6+fx7)
anova(model)
#> Analysis of Variance Table
#> 
#> Response: y
#>           Df  Sum Sq Mean Sq F value  Pr(>F)  
#> fx1        5 0.48109 0.09622  0.7360 0.60792  
#> x2         1 0.00254 0.00254  0.0195 0.89092  
#> x3         1 0.12262 0.12262  0.9379 0.34816  
#> x4         1 0.19583 0.19583  1.4980 0.23985  
#> fx5        6 2.02016 0.33669  2.5755 0.06424 .
#> fx6        1 0.08942 0.08942  0.6840 0.42117  
#> fx7        3 1.24507 0.41502  3.1747 0.05496 .
#> Residuals 15 1.96091 0.13073                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

^{Created on 2022-03-18 by the reprex package (v2.0.0)}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	danlooo

'How to handle categorical data in R for regression?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]