'RSTUDIO finding p values and R-squared for each subsample
Hi I am very new to R and to this forum.
I want to run multiple regressions on subsamples from a large dataset.
Here is a sample of my dataset named "totaldoc":

I want to do lm(numericdiffNGO∼numericdiffmeeting)) for each issue_name1.
I tried this command :
lapply(split(totaldoc, f = list(totaldoc$issue_name1)), function(x) lm(numericdiffNGO∼numericdiffmeeting))
and this command
ddply(totaldoc, "issue_name1", function(df)coefficients (lm(numericdiffNGO∼numericdiffmeeting, data=df)))
But it only give me the coefficients and even not for all the issu-name1 What I want to do is to have each p value per subsamples issu-name1 and to rank them from the most significant to the highest. And the same for rsquared but for the reverse so, the highest to the lowest.
Solution 1:[1]
Here's a stab using mtcars:
library(dplyr)
mtcars %>%
group_nest(cyl) %>%
mutate(
model = lapply(data, function(z) lm(mpg ~ disp, data = z)),
summ = lapply(model, summary),
p.value = sapply(summ, function(z) coef(z)[2,"Pr(>|t|)"]),
rsq = sapply(summ, `[[`, "r.squared")
) %>%
arrange(-p.value)
# # A tibble: 3 x 6
# cyl data model summ p.value rsq
# <dbl> <list<tibble[,10]>> <list> <list> <dbl> <dbl>
# 1 6 [7 x 10] <lm> <smmry.lm> 0.826 0.0106
# 2 8 [14 x 10] <lm> <smmry.lm> 0.0568 0.270
# 3 4 [11 x 10] <lm> <smmry.lm> 0.00278 0.648
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r2evans |
