'RSTUDIO finding p values and R-squared for each subsample

Hi I am very new to R and to this forum.

I want to run multiple regressions on subsamples from a large dataset. Here is a sample of my dataset named "totaldoc": sample dataset image

I want to do lm(numericdiffNGO∼numericdiffmeeting)) for each issue_name1.

I tried this command :

 lapply(split(totaldoc, f = list(totaldoc$issue_name1)), function(x) lm(numericdiffNGO∼numericdiffmeeting))

and this command

ddply(totaldoc, "issue_name1", function(df)coefficients (lm(numericdiffNGO∼numericdiffmeeting, data=df)))

But it only give me the coefficients and even not for all the issu-name1 What I want to do is to have each p value per subsamples issu-name1 and to rank them from the most significant to the highest. And the same for rsquared but for the reverse so, the highest to the lowest.



Solution 1:[1]

Here's a stab using mtcars:

library(dplyr)
mtcars %>%
  group_nest(cyl) %>%
  mutate(
    model = lapply(data, function(z) lm(mpg ~ disp, data = z)), 
    summ = lapply(model, summary), 
    p.value = sapply(summ, function(z) coef(z)[2,"Pr(>|t|)"]), 
    rsq = sapply(summ, `[[`, "r.squared")
  ) %>%
  arrange(-p.value)
# # A tibble: 3 x 6
#     cyl                data model  summ       p.value    rsq
#   <dbl> <list<tibble[,10]>> <list> <list>       <dbl>  <dbl>
# 1     6            [7 x 10] <lm>   <smmry.lm> 0.826   0.0106
# 2     8           [14 x 10] <lm>   <smmry.lm> 0.0568  0.270 
# 3     4           [11 x 10] <lm>   <smmry.lm> 0.00278 0.648 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 r2evans