'Residual standard error in survey package
I am trying to calculate the residual standard error of a linear regression model using the survey package. I am working with a complex design, and the sampling weight of the complex design is given by "weight" in the code below.
fitM1 <- lm(med~x1+x2,data=pop_sample,weight=weight)
fitM2 <- svyglm(med~x1+x2,data=pop_sample,design=design)
First, if I call "summary(fitM1)", I get the following:
Call: lm(formula=med~x1+x2,data=pop_sample,weights=weight)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001787 0.042194 0.042 0.966
x1 0.382709 0.061574 6.215 1.92e-09 ***
x2 0.958675 0.048483 19.773 < 2e-16 ***
Residual standard error: 9.231 on 272 degrees of freedom
Multiple R-squared: 0.8958, Adjusted R-squared: 0.8931
F-statistic: 334.1 on 7 and 272 DF, p-value: < 2.2e-16
Next, if I call "summary(fitM2)", I get the following:
summary(fitM2)
Call: svyglm(formula=med~x1+x2,data=pop_sample,design=design)
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001787 0.043388 0.041 0.967878
x1 0.382709 0.074755 5.120 0.000334 ***
x2 0.958675 0.041803 22.933 1.23e-10 ***
When using "lm", I can extract the residual standard error by calling:
fitMvariance <- summary(fitM1)$sigma^2
However, I can't find an analogous function for "svyglm" anywhere in the survey package. The point estimates are the same when comparing the two approaches, but the standard errors of the coefficients (and, presumably, the residual standard error of the model) are different.
Solution 1:[1]
Survey Analysis
use the library survey in the r to perform survey analysis, it offers a wide range of functions to calculate the statistics like Percentage, Lower CI, Upper CI, population and RSE.
RSE
we can use thesvyby function in the survey package to get all the statistics including the Root squared error
library("survey")
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
svyby(~med, ~x1+x2, design, svytotal, deff=TRUE, verbose=TRUE,vartype=c("se","cv","cvpct","var"))
The cvpct will give the root squared error
Refer for further information svyby
Solution 2:[2]
Because svyglm is built on glm not lm, the variance estimate is called $dispersion rather than $sigma
> data(api)
> dstrat<-svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
+ fpc = ~fpc)
> model<-svyglm(api00~ell+meals+mobility, design=dstrat)
> summary(model)$dispersion
variance SE
[1,] 5172 492.28
This is the estimate of $\sigma^2$, which is the population residual variance. In this example we actually have the whole population, so we can compare
> popmodel<-lm(api00~ell+meals+mobility, data=apipop)
> summary(popmodel)$sigma
[1] 70.58365
> sqrt(5172)
[1] 71.91662
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Community |
| Solution 2 | Thomas Lumley |
