'Tidyr/Purr nest and map_dbl for stats (e.g., max, mean) returning incorrect values

I'm trying to mutate a variety of summary statistics based on various groupings in my nested data. I'd like to use this strategy instead of summarize() as I want to store the summary statistics in a tibble with the original data, including other identifying variables.

Group Name Page Name User Date num min
Area A Page 1 user265 22-04-13 14 10
Area A Page 1 user265 22-04-14 5 3
Area B Page 2 user275 22-04-01 12 6

There are 8 'groups' and hundreds of 'pages' nested across those groups. Before nesting, each row represented observations by Page Name, User, and Date.

When grouping/nesting by Group Name, the stats I generate for either 'num' or 'min' match what would be expected based on the nested values.

However, when I group by Page Name, the results make no sense based on the data in the nested table. For example, the minimum value for 'num' and 'min' is 1, yet I'll get a mean of 0.10 and a min of 0 for one page. Due to the long format of the data, there are no missing values. I'm not sure why the results aren't consistent with the actual data in the nested table when grouping by Page Name.

adopt_30_nest <- adopt_30 %>%
#Select variables of interest
  select(page_name, group_name, user, date, num, min) %>%
#Nest/group by grouping factor, then nest
  group_by(page_name) %>%
  nest() %>%
#Create a new dbl column with the max num value for each page. 
  mutate(max_num = map_dbl(data, ~max(.x$num)))

Any ideas for how to fix this? Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source