'R: delete duplicate rows that have been flipped
Row 1 and row 4 have the same information. The only difference is the column they appear under has been flipped.
I already know Yuma County and Cheyenne County are neighbors from row 1. I don't need this information reiterated in row 4.
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
4 Cheyenne County, KS 20023 Yuma County, CO 8125
5 Cheyenne County, KS 20023 Dundy County, NE 31057
I don't mind that the counties appear more than once, I only care that the overall information in each row be different from the previous. I want to keep row 1 and delete row 4, so that the final looks like this
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
How can I delete rows with duplicate information in the dataset?
Solution 1:[1]
Here's another possible base R option:
df[!duplicated(t(apply(df, 1, sort))),]
Output
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
Data
df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO",
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS",
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO",
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L,
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))
Solution 2:[2]
You could also do:
idx <- duplicated(t(apply(CountyList[c('fipscounty', 'fipsneighbor')], 1, sort)))
CountyList[!idx, ]
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
Solution 3:[3]
We can use interaction to generate unique factors after finding the name with "smaller" (i.e. first in alphabet) name as well as "larger" name. Then we can filter the data.frame based on that:
CountyList <- read.table(text="countyname fipscounty neighborname fipsneighbor
1 'Yuma County, CO' 8125 'Cheyenne County, KS' 20023
2 'Yuma County, CO' 8125 'Chase County, NE' 31029
3 'Cheyenne County, KS' 20023 'Kit Carson County, CO' 8063
4 'Cheyenne County, KS' 20023 'Yuma County, CO' 8125
5 'Cheyenne County, KS' 20023 'Dundy County, NE' 31057")
fname <- pmin(CountyList$countyname,CountyList$neighborname) #Get first name
lname <- pmax(CountyList$countyname,CountyList$neighborname) #Get last names
duplicate.key <- as.numeric(interaction(fname,lname)) # Create factors from interaction and convert to numeric
CountyList[match(unique(duplicate.key),duplicate.key),] # Only keep first occurence
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
Solution 4:[4]
Here's a tidyverse approach.
First unite all columns together into new_col (i.e. to paste all columns together). Then split the new_col back to it's individual pieces and sort them. Save this into new_col2. Next we only keep the distinct rows of new_col2. Finally removes the newly created columns.
library(tidyverse)
df %>%
unite("new_col", everything(), sep = "_", remove = F) %>%
rowwise() %>%
mutate(new_col2 = paste(sort(str_split(new_col, "_", simplify = T)), collapse = "")) %>%
ungroup() %>%
distinct(new_col2, .keep_all = T) %>%
select(-starts_with("new_col"))
# A tibble: 4 × 4
countyname fipscounty neighborname fipsneighbor
<chr> <int> <chr> <int>
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
4 Cheyenne County, KS 20023 Dundy County, NE 31057
Data
df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO",
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS",
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO",
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L,
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | onyambu |
| Solution 3 | Julian_Hn |
| Solution 4 | benson23 |
