'R - Create new column that contains all values from that group
I have a DF that looks like this:
First Last First_Last Group Address Cost
Joe Smith Joe Smith SmithFamily 123 fake st 50
Jake Smith Jake Smith SmithFamily 123 fake st 50
Mike Smith Mike Smith SmithFamily 333 fake ave 50
Jessica Gonzales Jessica Gonzales SmithFamily 333 fake ave 50
Alfred Watson Alfred Watson WatsonFamily 999 fake ct 65
Emily Watson Emily Watson WatsonFamily 999 fake ct 65
Peter Gregory Peter Gregory WatsonFamily 999 fake ct 50
I want a new column that has "Associated Members" that basically has every other family member who shares the same group value
Expected outcome:
First Last First_Last Group Address Cost Associated
Joe Smith Joe Smith SmithFamily 123 fake st 50 Jake Smith, Mike Smith, Jessica Gonzales
Jake Smith Jake Smith SmithFamily 123 fake st 50 Joe Smith, Mike Smith, Jessica Gonzales
Mike Smith Mike Smith SmithFamily 333 fake ave 50 Joe Smith, Jake Smith, Jessica Gonzales
Jessica Gonzales Jessica Gonzales SmithFamily 333 fake ave 50 Joe Smith, Jake Smith, Mike Smith
Alfred Watson Alfred Watson WatsonFamily 999 fake ct 65 Emily Watson, Peter Gregory
Emily Watson Emily Watson WatsonFamily 999 fake ct 65 Alfred Watson, Peter Gregory
Peter Gregory Peter Gregory WatsonFamily 999 fake ct 50 Alfred Watson, Emily Watson
I have an almost there version where I can do:
df %>%
group_by(group) %>%
mutate(Associated= paste0(First_Last, collapse = ","))
But this outputs all of the people whose name is in the 'group' column, instead of 'everyone but this row's name'
Thank you!
Solution 1:[1]
After using paste0, you can use str_replace_all to remove a name if it occurs in First_Last. Then, we can use a combination of gsub and trimws to clean up the list and remove extra commas and spaces.
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(
Associated = paste0(First_Last, collapse = ", "),
Associated = trimws(
gsub(
"^,*|(?<=,) ,|, *$",
"",
str_replace_all(Associated, First_Last, ""),
perl = T
),
whitespace = " ",
which = "both"
)
)
Output
First Last First_Last Group Address Cost Associated
<chr> <chr> <chr> <chr> <chr> <int> <chr>
1 Joe Smith Joe Smith SmithFamily 123fake 50 Jake Smith, Mike Smith, Jessica Gonzales
2 Jake Smith Jake Smith SmithFamily 123fake 50 Joe Smith, Mike Smith, Jessica Gonzales
3 Mike Smith Mike Smith SmithFamily 123fake 50 Joe Smith, Jake Smith, Jessica Gonzales
4 Jessica Gonzales Jessica Gonzales SmithFamily 123fake 50 Joe Smith, Jake Smith, Mike Smith
5 Alfred Watson Alfred Watson WatsonFamily 123fake 65 Emily Watson, Peter Gregory
6 Emily Watson Emily Watson WatsonFamily 123fake 65 Alfred Watson, Peter Gregory
7 Peter Gregory Peter Gregory WatsonFamily 123fake 50 Alfred Watson, Emily Watson
Data.table
Or we can use data.table, which would be much quicker. We can loop over the sequence, then use that to index the First_Last, then paste together via toString.
library(data.table)
dt <- as.data.table(df)
dt[,Associated:=
unlist(lapply(1:.N, function(i) toString(First_Last[-i]))) , Group]
Data
df <- structure(list(First = c("Joe", "Jake", "Mike", "Jessica", "Alfred",
"Emily", "Peter"), Last = c("Smith", "Smith", "Smith", "Gonzales",
"Watson", "Watson", "Gregory"), First_Last = c("Joe Smith", "Jake Smith",
"Mike Smith", "Jessica Gonzales", "Alfred Watson", "Emily Watson",
"Peter Gregory"), Group = c("SmithFamily", "SmithFamily", "SmithFamily",
"SmithFamily", "WatsonFamily", "WatsonFamily", "WatsonFamily"
), Address = c("123fake", "123fake", "123fake", "123fake", "123fake",
"123fake", "123fake"), Cost = c(50L, 50L, 50L, 50L, 65L, 65L,
50L)), class = "data.frame", row.names = c(NA, -7L))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
