'R format data frame duplicated ID and redondant information

From my dataframe, I would need to remove the non useful information labelled as "Not done" and keep the interesting one "Neg" available from one the duplicated ID. Sorry not easy to explain. So, my dataframe below :

df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
                 Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
                 Variable2 = c("Not Done",  "Neg",  "Not Done", "Neg",  "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
                 Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))

An example of the expected output :

df_A <- data.frame(ID = c("A1", "A2", "A3"),
                 Variable1 = c("Neg", "Neg", "Not Done"),
                 Variable2 = c("Neg", "Neg", "Not Done"),
                 Variable3 = c("Neg","Neg","Not Done"))

As you can see, A3, all the values are "Not Done" and so need to keep it once.



Solution 1:[1]

In case there is only Neg and Not Done I would convert them in TRUE and FALSE and use any an aggregate.

aggregate(df[-1]=="Neg", df[1], any)
#  ID Variable1 Variable2 Variable3
#1 A1      TRUE      TRUE      TRUE
#2 A2      TRUE      TRUE      TRUE
#3 A3     FALSE     FALSE     FALSE

Solution 2:[2]

library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)

neg_find <- function(vector) {
  result <- "Neg" %in% vector
  return(result)
}


final_result_neg <- function(dataframe) {
  t <- tapply(dataframe, ID,neg_find)
  return(t)
}

df2 <- apply(df, 2, final_result_neg)%>%data.frame()

df2$ID <- NULL
df2[df2==TRUE] <- 'Neg'
df2[df2==FALSE] <- 'Not Done'

df2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 GKi
Solution 2 Arthur Vaz