'Using other Column information to create a new column based on categories in R

I have a dataset with information along these lines:

d <- data.frame( 
  ID = c(1,2,3,4),
  Baseball = c(1,0,0,0),
  Basketball = c(1,0,1,0),
  Lacross = c(0,1,0,0),
  Hockey = c(1,0,0,0),
  Football = c(0,0,1,0)
)

Where 1 indicated that the person does the sport and 0 indicated they do not.

Now I want to create a new column that takes into account the sport season and apprehends them together (in this, Spring= baseball, lacrosse, Fall = football, Winter= basketball, hockey)

What I would want is an output something like:

ID   Baseball   Basketball   Lacross   Hockey   Football   Season
1       1            1          0         1         0      Spring, Winter
2       0            0          1         0         0      Spring
3       0            1          0         0         1      Winter, Fall
4       0            0          0         0         0      NA

Where I do not want duplicates if they play multiple sports in the same season like "Spring, Winter, Winter". Preferably if they could all be in the same order (all Spring, Winters are in that order), that would be great, but I can also edit that later.



Solution 1:[1]

Create a named vector and then loop over the rows, get the column names where the value is 1, use the named vector to match and paste the unique values

nm1 <- setNames( c("Spring", "Winter", "Spring", "Winter", "Fall"), names(d)[-1])
d$Season <- apply(d[-1], 1, \(x) toString(unique(nm1[names(x)][as.logical(x)])))
d$Season[d$Season == ""] <- NA

-output

> d
  ID Baseball Basketball Lacross Hockey Football         Season
1  1        1          1       0      1        0 Spring, Winter
2  2        0          0       1      0        0         Spring
3  3        0          1       0      0        1   Winter, Fall
4  4        0          0       0      0        0           <NA>

Solution 2:[2]

Here is another approach. Create a look up table called seasons, and merge it on a long version of d

library(data.table)
seasons = data.table(
  season = c("Spring", "Spring", "Fall", "Winter", "Winter"),
  sport = c("Baseball", "Lacross", "Football", "Basketball", "Hockey")
)
merge(
  setDT(d),
  melt(
    d, id="ID", variable.name = "sport")[value==1] %>% 
    .[seasons, on=.(sport)] %>% 
    .[,.(list(unique(season))), by=ID], 
  all.x = T
)

Output:

      ID Baseball Basketball Lacross Hockey Football            V1
   <num>    <num>      <num>   <num>  <num>    <num>        <list>
1:     1        1          1       0      1        0 Spring,Winter
2:     2        0          0       1      0        0        Spring
3:     3        0          1       0      0        1   Fall,Winter
4:     4        0          0       0      0        0              

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 akrun
Solution 2 langtang