'Error: Variable length differs in lm regression using paste function

I have generated randomly a dataset that has been split in two (L and I).

First I run the regression on L using all the covariates. After defining the set of variables that are significantly different form zero I want to run the regression on I using this set of variables.

reg_L = lm(y ~ ., data = data)
S_hat = as.data.frame(round(summary(reg_L)$coefficients[,"Pr(>|t|)"], 3)<0.05)
S_hat_L = rownames(which(S_hat==TRUE, arr.ind = TRUE))

Therefore here I want to run the new model that doesn't work only due to a problem in the specification of the variable x. What am I doing wrong?

# Using the I proportion to construct the p-values
x = noquote(paste(S_hat_L, collapse = " + "))
reg_I = lm(y ~ x, data = data)
summary(reg_I)

r linear-regression paste

Solution 1:^[1]

A simpler way than trying to manipulate a formula programmatically would be to remove the unwanted predictors from the data:

wanted <- summary(fit)$coefficients[,"Pr(>|t|)"] < 0.05
reduced.data <- data[, wanted]
reg_S <- lm(y ~ ., data=reduced.data)

Note however, that it is more robust with respect to out-of-sample performance to reduce variables with the LASSO. This will yield a model that has some coefficients set to zero, but the other coefficients are adjusted in such a way that the uot-of-sample performance will be better.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Error: Variable length differs in lm regression using paste function

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]