'Tidyverse and R: how to count rows in a tibble of a nested dataframe

So, I've checked multiple posts and haven't found anything. According to this, my code should work, but it isn't.

Objective: I want to essentially print out the number of subjects--which in this case is also the number of rows in this tibble.

Code:

 data<-read.csv("advanced_r_programming/data/MIE.csv")

make_LD<-function(x){
  LongitudinalData<-x%>%
    group_by(id)%>%
    nest()
  structure(list(LongitudinalData), class = "LongitudinalData")
}

print.LongitudinalData<-function(x){
  paste("Longitudinal dataset with", x[["id"]], "subjects")

}

x<-make_LD(data)

print(x)

Here's the head of the dataset I'm working on:

> head(x)
[[1]]
# A tibble: 10 x 2
      id                  data
   <int>                <list>
 1    14 <tibble [11,945 x 4]>
 2    20 <tibble [11,497 x 4]>
 3    41 <tibble [11,636 x 4]>
 4    44 <tibble [13,104 x 4]>
 5    46 <tibble [13,812 x 4]>
 6    54 <tibble [10,944 x 4]>
 7    64 <tibble [11,367 x 4]>
 8    74 <tibble [11,517 x 4]>
 9   104 <tibble [11,232 x 4]>
10   106 <tibble [13,823 x 4]>

Output:

[1] "Longitudinal dataset with  subjects"

I've tried every possible combination from the aforementioned stackoverflow post and none seem to work.



Solution 1:[1]

Here are two options:

library(tidyverse)

# Create a nested data frame
dat = mtcars %>% 
  group_by(cyl) %>% 
  nest %>% as.tibble
    cyl               data
1     6  <tibble [7 x 10]>
2     4 <tibble [11 x 10]>
3     8 <tibble [14 x 10]>
dat %>% 
  mutate(nrow=map_dbl(data, nrow))

dat %>% 
  group_by(cyl) %>% 
  mutate(nrow = nrow(data.frame(data)))
    cyl               data  nrow
1     6  <tibble [7 x 10]>     7
2     4 <tibble [11 x 10]>    11
3     8 <tibble [14 x 10]>    14

Solution 2:[2]

There is a specific function for this in the tidyverse: n()

You can simply do: mtcars %>% group_by(cyl) %>% summarise(rows = n())

> mtcars %>% group_by(cyl) %>% summarise(rows = n())
# A tibble: 3 x 2
    cyl  rows
  <dbl> <int>
1     4    11
2     6     7
3     8    14

In more sophisticated cases, in which subjects may span across multiple rows ("long format data"), you can do (assuming hp denotes the subject):

> mtcars %>% group_by(cyl, hp) %>% #always group by subject-ID last
+   summarise(n = n()) %>% #observations per subject and cyl
+   summarise(n = n()) #subjects per cyl (implicitly summarises across all group-variables except the last)
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 3 x 2
    cyl     n
  <dbl> <int>
1     4    10
2     6     4
3     8     9

Note that the n in the last case is smaller than in the first because there are cars with same amount of cyl and hp that are now counted as just one "subject".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 eipi10
Solution 2