'R Error: Error in Linear Regression Model Prediction, and Redundancy
R Novice here. I'm working on a project to evaluate if there is a difference in perceived stress as stratified by gender (Male=0,Female=1). I'm simultaneously learning the statistics and the code, so I think there's some redundancy in my code. I was using the covariates of income, education, and activity levels to build a predictive model.
The data set is titled Data. Gender is (0/1), perceived stress is (0-20, treated as continuous), income (4 categories(coded 1-4), education is (0/1), and activity levels is a scale(0-5). I have a separate code to evaluate the perceived stress mean by gender groups via two sample t test. I'm also working on a regression model. I believe linear regression is correct here, but I'm having some issues.
The error code is
Error:
Assigned data predict(full, Data = new, na.omit = TRUE) must be compatible with existing data.
x Existing data has 2653 rows.
x Assigned data has 2243 rows.
Only vectors of size 1 are recycled.
Backtrace:
- base::
$<-(*tmp*, lmprediction, value =<dbl>) - tibble
<fn>(<vctrs___>)
How can I adjust this to run the linear prediction? Also, I know I forgot something, so if you notice anything wrong, missing, or redundant, please let me know! Thanks!
Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 63.4 1 0 1.75 12 0
2 56.0 3 1 2 7 1
3 56.5 4 1 2.75 0 1
4 40.0 2 1 2.75 9 1
5 47.7 2 0 1 10 1
6 68.1 NA 0 2.5 0 0
gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)
#check linear fit for two primary variables and perform linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)
#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)
#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
geom_hline(yintercept=0, lty=2)+
theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
geom_smooth(method = "lm", se = FALSE)+
theme_bw()
#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
geom_hline(yintercept=0, lty=2)+
theme_bw()
## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) +
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
theme_bw()
#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)
rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models
model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions).
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE)
glance(aic.backwards)
tidy(aic.backwards)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
