'How can I remove duplicate values in different columns of R dataframe?
I would like a dataframe that removes duplicate values on a column-by-column based.
I attach an example where I would like to select the values in C1 which are not repeated in C3 and C4 and keep the whole row. So that:
- Row 1 is deleted because "a" appears in row 3 of C3.
- Row 2 is deleted because "b" appears in row 1 of C3.
- Row 3 is not deleted because there is no "c" in C3 or C4.
- Row 4 is deleted because 'd' appears in rows 2 and 3 of C4.
- Row 5 is not deleted because there is no "e" in C3 or C4.
How can I do this? Thank you very much.
df <- data.frame(
"C1" = c("a", "b", "c", "d", "e"),
"C2" = c(1.2, 3.4, 4.5, 5.6, 7.8),
"C3" = c("b", "b", "a", "d", "f"),
"C4" = c("a","d","d","a", "g"))
## C1 C2 C3 C4
## 1 a 1.2 b a
## 2 b 3.4 b d
## 3 c 4.5 a d
## 4 d 5.6 d a
## 5 e 7.8 f g
df_final <- data.frame(
"C1" = c("c", "e"),
"C2" = c(4.5, 7.8),
"C3" = c("a", "f"),
"c4" = c("f", "g"))
## C1 C2 C3 c4
## 1 c 4.5 a f
## 2 e 7.8 f g
Solution 1:[1]
library(dplyr)
df <- data.frame(
"C1" = c("a", "b", "c", "d", "e"),
"C2" = c(1.2, 3.4, 4.5, 5.6, 7.8),
"C3" = c("b", "b", "a", "d", "f"),
"C4" = c("a","d","d","a", "g"))
df |>
filter(!C1 %in% union(C3, C4))
##> + C1 C2 C3 C4
##> 1 c 4.5 a d
##> 2 e 7.8 f g
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stefano Barbi |
