'R: delete duplicate rows that have been flipped

Row 1 and row 4 have the same information. The only difference is the column they appear under has been flipped.

I already know Yuma County and Cheyenne County are neighbors from row 1. I don't need this information reiterated in row 4.

           countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
4 Cheyenne County, KS      20023       Yuma County, CO         8125
5 Cheyenne County, KS      20023      Dundy County, NE        31057

I don't mind that the counties appear more than once, I only care that the overall information in each row be different from the previous. I want to keep row 1 and delete row 4, so that the final looks like this

           countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

How can I delete rows with duplicate information in the dataset?

r dataframe data-manipulation

Solution 1:^[1]

Here's another possible base R option:

df[!duplicated(t(apply(df, 1, sort))),]

Output

         countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

Data

df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO", 
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS", 
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO", 
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L, 
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))

Solution 2:^[2]

You could also do:

idx <- duplicated(t(apply(CountyList[c('fipscounty', 'fipsneighbor')], 1, sort)))
CountyList[!idx, ]

          countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

Solution 3:^[3]

We can use interaction to generate unique factors after finding the name with "smaller" (i.e. first in alphabet) name as well as "larger" name. Then we can filter the data.frame based on that:

CountyList <- read.table(text="countyname fipscounty          neighborname fipsneighbor
1     'Yuma County, CO'       8125   'Cheyenne County, KS'        20023
2     'Yuma County, CO'       8125      'Chase County, NE'        31029
3 'Cheyenne County, KS'      20023 'Kit Carson County, CO'         8063
4 'Cheyenne County, KS'      20023       'Yuma County, CO'         8125
5 'Cheyenne County, KS'      20023      'Dundy County, NE'        31057")


fname <- pmin(CountyList$countyname,CountyList$neighborname) #Get first name
lname <- pmax(CountyList$countyname,CountyList$neighborname) #Get last names

duplicate.key <- as.numeric(interaction(fname,lname)) # Create factors from interaction and convert to numeric

CountyList[match(unique(duplicate.key),duplicate.key),] # Only keep first occurence


           countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

Solution 4:^[4]

Here's a tidyverse approach.

First unite all columns together into new_col (i.e. to paste all columns together). Then split the new_col back to it's individual pieces and sort them. Save this into new_col2. Next we only keep the distinct rows of new_col2. Finally removes the newly created columns.

library(tidyverse)

df %>% 
  unite("new_col", everything(), sep = "_", remove = F) %>% 
  rowwise() %>% 
  mutate(new_col2 = paste(sort(str_split(new_col, "_", simplify = T)), collapse = "")) %>% 
  ungroup() %>% 
  distinct(new_col2, .keep_all = T) %>% 
  select(-starts_with("new_col"))

# A tibble: 4 × 4
  countyname          fipscounty neighborname          fipsneighbor
  <chr>                    <int> <chr>                        <int>
1 Yuma County, CO           8125 Cheyenne County, KS          20023
2 Yuma County, CO           8125 Chase County, NE             31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
4 Cheyenne County, KS      20023 Dundy County, NE             31057

Data

df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO", 
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS", 
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO", 
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L, 
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	onyambu
Solution 3	Julian_Hn
Solution 4	benson23

'R: delete duplicate rows that have been flipped

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]