'R - Create new column that contains all values from that group

I have a DF that looks like this:

First   Last    First_Last  Group   Address Cost
Joe Smith   Joe Smith   SmithFamily 123 fake st 50
Jake    Smith   Jake Smith  SmithFamily 123 fake st 50
Mike    Smith   Mike Smith  SmithFamily 333 fake ave    50
Jessica Gonzales    Jessica Gonzales    SmithFamily 333 fake ave    50
Alfred  Watson  Alfred Watson   WatsonFamily    999 fake ct 65
Emily   Watson  Emily Watson    WatsonFamily    999 fake ct 65
Peter   Gregory Peter Gregory   WatsonFamily    999 fake ct 50

I want a new column that has "Associated Members" that basically has every other family member who shares the same group value

Expected outcome:

First   Last    First_Last  Group   Address Cost    Associated
Joe Smith   Joe Smith   SmithFamily 123 fake st 50  Jake Smith, Mike Smith, Jessica Gonzales
Jake    Smith   Jake Smith  SmithFamily 123 fake st 50  Joe Smith, Mike Smith, Jessica Gonzales
Mike    Smith   Mike Smith  SmithFamily 333 fake ave    50  Joe Smith, Jake Smith, Jessica Gonzales
Jessica Gonzales    Jessica Gonzales    SmithFamily 333 fake ave    50  Joe Smith, Jake Smith, Mike Smith
Alfred  Watson  Alfred Watson   WatsonFamily    999 fake ct 65  Emily Watson, Peter Gregory
Emily   Watson  Emily Watson    WatsonFamily    999 fake ct 65  Alfred Watson, Peter Gregory
Peter   Gregory Peter Gregory   WatsonFamily    999 fake ct 50  Alfred Watson, Emily Watson

I have an almost there version where I can do:

df %>% 
     group_by(group) %>% 
     mutate(Associated= paste0(First_Last, collapse = ",")) 

But this outputs all of the people whose name is in the 'group' column, instead of 'everyone but this row's name'

Thank you!



Solution 1:[1]

After using paste0, you can use str_replace_all to remove a name if it occurs in First_Last. Then, we can use a combination of gsub and trimws to clean up the list and remove extra commas and spaces.

library(tidyverse)

df %>%
  group_by(Group) %>%
  mutate(
    Associated = paste0(First_Last, collapse = ", "),
    Associated = trimws(
      gsub(
        "^,*|(?<=,) ,|, *$",
        "",
        str_replace_all(Associated, First_Last, ""),
        perl = T
      ),
      whitespace = " ",
      which = "both"
    )
  )

Output

  First   Last     First_Last       Group        Address  Cost Associated                              
  <chr>   <chr>    <chr>            <chr>        <chr>   <int> <chr>                                   
1 Joe     Smith    Joe Smith        SmithFamily  123fake    50 Jake Smith, Mike Smith, Jessica Gonzales
2 Jake    Smith    Jake Smith       SmithFamily  123fake    50 Joe Smith, Mike Smith, Jessica Gonzales 
3 Mike    Smith    Mike Smith       SmithFamily  123fake    50 Joe Smith, Jake Smith, Jessica Gonzales 
4 Jessica Gonzales Jessica Gonzales SmithFamily  123fake    50 Joe Smith, Jake Smith, Mike Smith       
5 Alfred  Watson   Alfred Watson    WatsonFamily 123fake    65 Emily Watson, Peter Gregory             
6 Emily   Watson   Emily Watson     WatsonFamily 123fake    65 Alfred Watson, Peter Gregory            
7 Peter   Gregory  Peter Gregory    WatsonFamily 123fake    50 Alfred Watson, Emily Watson    

Data.table

Or we can use data.table, which would be much quicker. We can loop over the sequence, then use that to index the First_Last, then paste together via toString.

library(data.table)
dt <- as.data.table(df)

dt[,Associated:=
    unlist(lapply(1:.N, function(i) toString(First_Last[-i]))) , Group]

Data

df <- structure(list(First = c("Joe", "Jake", "Mike", "Jessica", "Alfred", 
"Emily", "Peter"), Last = c("Smith", "Smith", "Smith", "Gonzales", 
"Watson", "Watson", "Gregory"), First_Last = c("Joe Smith", "Jake Smith", 
"Mike Smith", "Jessica Gonzales", "Alfred Watson", "Emily Watson", 
"Peter Gregory"), Group = c("SmithFamily", "SmithFamily", "SmithFamily", 
"SmithFamily", "WatsonFamily", "WatsonFamily", "WatsonFamily"
), Address = c("123fake", "123fake", "123fake", "123fake", "123fake", 
"123fake", "123fake"), Cost = c(50L, 50L, 50L, 50L, 65L, 65L, 
50L)), class = "data.frame", row.names = c(NA, -7L))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1