'Combining/aggregating data in R

I feel like this is a really simple question, and I've looked a lot of places to try to find an answer to it, but everything seems to be looking to do a lot more than what I want--

I have a dataset that has multiple observations from multiple participants. One of the factors is where they're from (e.g. Buckinghamshire, Sussex, London). I want to combine everything that isn't London so I have two categories that are London and notLondon. How would I do this? I'd them want to be able to run a lm on these two, so how would I edit my dataset so that I could do lm(fom ~ [other factor]) where it would be the combined category?

Also, how would I combine all observations from each respective participant for a category? e.g. I have a category that's birth year, but currently when I do a summary of my data it will say, for example, 1996:265, because there are 265 observations from people born in '96. But I just want it to tell me how many participants were born in 1996.

Thanks!

r statistics

Solution 1:^[1]

There are multiple parts to your question so let's take it step by step.

1.

For the first part this is a great use of tidyr::fct_collapse(). See example here:

library(tidyverse)

set.seed(1)
d <- sample(letters[1:5], 20, T) %>% factor()

# original distribution
table(d)
#> d
#> a b c d e 
#> 6 4 3 1 6

# lumped distribution
fct_collapse(d, a = "a", other_level = "other") %>% table()
#> .
#>     a other 
#>     6    14

^{Created on 2022-02-10 by the reprex package (v2.0.1)}

2.

For the second part, you will have to clarify and share some data to get more help.

3.

Here you can use dplyr::summarize(n = n()) but you need to share some data to get an answer with your specific case.

However something like:

df %>% group_by(birth_year) %>% summarize(n = n())

will give you number of people with that birth year listed.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Dan Adams

'Combining/aggregating data in R

Solution 1:[1]

1.

2.

3.

Sources

Related Questions

Solution 1:^[1]