'identifying last occurring duplicates in a vector in R
I would like to identify all unique values and last occurring instances of multiple values in a vector. For example, I would like to to identify the positions
c(2,3,4,6,7)
in the vector:
v <- c("m", "m", "k", "r", "l", "o", "l")
I see that
(duplicated(v) | duplicated(v, fromLast = T))
identifies all duplicated values, yet I would like to only identify the last occurring instances of duplicated elements.
How to achieve this without a loop?
Solution 1:[1]
You could do something like:
library(dplyr)
v %>%
as_tibble() %>%
mutate(index = row_number()) %>%
group_by(value) %>%
mutate(id=row_number()) %>%
filter(id == max(id))
Which gives us:
# A tibble: 5 × 3
# Groups: value [5]
value index id
<chr> <int> <int>
1 m 2 2
2 k 3 1
3 r 4 1
4 o 6 1
5 l 7 2
Additionally, if you just want the index, you can do:
v %>%
as_tibble() %>%
mutate(index = row_number()) %>%
group_by(value) %>%
mutate(id=row_number()) %>%
filter(id == max(id)) %>%
pull(index)
...to get:
[1] 2 3 4 6 7
Solution 2:[2]
We can try
> sort(tapply(seq_along(v), v, max))
m k r o l
2 3 4 6 7
or
> unique(ave(seq_along(v), v, FUN = max))
[1] 2 3 4 7 6
or
> rev(length(v) - which(!duplicated(rev(v))) + 1)
[1] 2 3 4 6 7
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Matt |
| Solution 2 | ThomasIsCoding |
