'Drop duplicate row with lowest value (dplyr) r [duplicate]
I have a dataframe with rows that have duplicate rows and I want to drop those that have the lowest value possibly using dplyr, I've tried the following and it removes some duplicate rows while others for some reason remain unfortunately.
Below is an example of what the DF looks like where lowest value to be removed should be based on col2. In other words, duplicate rows with the highest values should be kept.
Current DataFrame
ID Col1 Col2
ABA 0.65 0.66
ABB 0.65 0.66
ABB 0.65 0.77
ABC 0.55 0.88
ABC 0.14 0.14
ABC 0.15 0.50
ABD 0.25 0.60
Desired DataFrame
ID Col1 Col2
ABA 0.65 0.66
ABB 0.65 0.77
ABC 0.55 0.88
ABD 0.25 0.60
Code Attempt
df %>% group_by(id) %>% top_n(0, Col2)
and
df <- df[order(df$id, df$Col2), ]
df <- df[ !duplicated(df$Col2), ]
Solution 1:[1]
A possible solution:
library(dplyr)
df <- data.frame(
stringsAsFactors = FALSE,
ID = c("ABA", "ABB", "ABB", "ABC", "ABC", "ABC", "ABD"),
Col1 = c(0.65, 0.65, 0.65, 0.55, 0.14, 0.15, 0.25),
Col2 = c(0.66, 0.66, 0.77, 0.88, 0.14, 0.5, 0.6)
)
df %>%
group_by(ID) %>%
slice_max(Col2, n=1) %>%
ungroup
#> # A tibble: 4 × 3
#> ID Col1 Col2
#> <chr> <dbl> <dbl>
#> 1 ABA 0.65 0.66
#> 2 ABB 0.65 0.77
#> 3 ABC 0.55 0.88
#> 4 ABD 0.25 0.6
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
