'How to use duplicated function with dplyr in R?

In the dataframe, if two rows have the same id then I want the new column status to read YES else NO

Here is my attempt

set.seed(111)
id <- c(1,1,2,2,3,4,5,6)
val <- c(9,0,2,4,1,0,0,2)
df <- data.frame(val,id)

df <- df%>%
  group_by(id) %>%
  mutate(status = ifelse(duplicated(id), 'YES', 'NO'))


      val    id status
 
1     9     1 NO    
2     0     1 YES   
3     2     2 NO    
4     4     2 YES   
5     1     3 NO    
6     0     4 NO    
7     0     5 NO    
8     2     6 NO   

I want the table to instead read:

      val    id status
 
1     9     1 YES    
2     0     1 YES   
3     2     2 YES    
4     4     2 YES   
5     1     3 NO    
6     0     4 NO    
7     0     5 NO    
8     2     6 NO 


Solution 1:[1]

Here is a potential solution:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(111)
id <- c(1,1,2,2,3,4,5,6)
val <- c(9,0,2,4,1,0,0,2)
df <- data.frame(val,id)

df%>%
  group_by(id) %>%
  mutate(status = ifelse(id == lag(id, default = 0), "YES", "NO"),
         status = ifelse(id == lead(id, default = 0), "YES", status))
#> # A tibble: 8 × 3
#> # Groups:   id [6]
#>     val    id status
#>   <dbl> <dbl> <chr> 
#> 1     9     1 YES   
#> 2     0     1 YES   
#> 3     2     2 YES   
#> 4     4     2 YES   
#> 5     1     3 NO    
#> 6     0     4 NO    
#> 7     0     5 NO    
#> 8     2     6 NO

Created on 2022-05-24 by the reprex package (v2.0.1)

Edit

Although, the solution above doesn't work if the ID's are 'out of order'; this is a better approach:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(111)
id <- c(1,1,2,3,2,4,5,6)
val <- c(9,0,2,4,1,0,0,2)
df <- data.frame(val,id)

df %>%
  group_by(id) %>%
  mutate(status = ifelse(n() > 1, "YES", "NO"))
#> # A tibble: 8 × 3
#> # Groups:   id [6]
#>     val    id status
#>   <dbl> <dbl> <chr> 
#> 1     9     1 YES   
#> 2     0     1 YES   
#> 3     2     2 YES   
#> 4     4     3 NO    
#> 5     1     2 YES   
#> 6     0     4 NO    
#> 7     0     5 NO    
#> 8     2     6 NO

Created on 2022-05-24 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jared_mamrot