'How create column that lists number of occurrences of X in another column?
I've got a huge df that include the following:
subsetdf <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17))
I want to add a column, GroupSize, that tells for each Id how many other Ids share the same TicketNo value. In other words, I want output like this:
TheDream <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3)
I've unsuccessfully tried:
subsetdf <- subsetdf %>%
group_by(TicketNo) %>%
add_count(name = "GroupSize")
I'd like to use mutate() but I can't seem to get it right.
Edit
With the GroupSize column now added, I want to add a final column that looks at the values in two other columns and returns the value of whichever is higher. So I've got:
df <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3),FamilySize=c(2,2,1,1,4,4)
And I want:
df <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3),FamilySize=c(2,2,1,1,4,4),FinalSize=c(2,2,2,3,4,4)
I've unsuccessfully tried:
df <- df %>% pmax(df$GroupSize, df$FamilySize) %>% dplyr::mutate(FinalSize = n())
That attempt earns me the error: Error: ! Subscript iis a matrix, the datavalue` must have size 1.
Backtrace:
- ... %>% dplyr::mutate(Groupsize = n())
- base::pmax(., train_data$Family_size, train_data$PartySize)
- tibble:::
[<-.tbl_df(*tmp*, change, value =<int>) - tibble:::tbl_subassign_matrix(x, j, value, j_arg, substitute(value))`
Solution 1:[1]
If we need to use mutate use n() to get the group size. Also, make sure that the mutate is from dplyr (as there is also a plyr::mutate which could mask the function if it is loaded later)
library(dplyr)
subsetdf %>%
group_by(TicketNo) %>%
dplyr::mutate(GroupSize = n())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
