'Error: Variable length differs in lm regression using paste function
I have generated randomly a dataset that has been split in two (L and I).
First I run the regression on L using all the covariates. After defining the set of variables that are significantly different form zero I want to run the regression on I using this set of variables.
reg_L = lm(y ~ ., data = data)
S_hat = as.data.frame(round(summary(reg_L)$coefficients[,"Pr(>|t|)"], 3)<0.05)
S_hat_L = rownames(which(S_hat==TRUE, arr.ind = TRUE))
Therefore here I want to run the new model that doesn't work only due to a problem in the specification of the variable x. What am I doing wrong?
# Using the I proportion to construct the p-values
x = noquote(paste(S_hat_L, collapse = " + "))
reg_I = lm(y ~ x, data = data)
summary(reg_I)
Solution 1:[1]
A simpler way than trying to manipulate a formula programmatically would be to remove the unwanted predictors from the data:
wanted <- summary(fit)$coefficients[,"Pr(>|t|)"] < 0.05
reduced.data <- data[, wanted]
reg_S <- lm(y ~ ., data=reduced.data)
Note however, that it is more robust with respect to out-of-sample performance to reduce variables with the LASSO. This will yield a model that has some coefficients set to zero, but the other coefficients are adjusted in such a way that the uot-of-sample performance will be better.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
