'R format data frame duplicated ID and redondant information
From my dataframe, I would need to remove the non useful information labelled as "Not done" and keep the interesting one "Neg" available from one the duplicated ID. Sorry not easy to explain. So, my dataframe below :
df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable2 = c("Not Done", "Neg", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))
An example of the expected output :
df_A <- data.frame(ID = c("A1", "A2", "A3"),
Variable1 = c("Neg", "Neg", "Not Done"),
Variable2 = c("Neg", "Neg", "Not Done"),
Variable3 = c("Neg","Neg","Not Done"))
As you can see, A3, all the values are "Not Done" and so need to keep it once.
Solution 1:[1]
In case there is only Neg and Not Done I would convert them in TRUE and FALSE and use any an aggregate.
aggregate(df[-1]=="Neg", df[1], any)
# ID Variable1 Variable2 Variable3
#1 A1 TRUE TRUE TRUE
#2 A2 TRUE TRUE TRUE
#3 A3 FALSE FALSE FALSE
Solution 2:[2]
library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)
neg_find <- function(vector) {
result <- "Neg" %in% vector
return(result)
}
final_result_neg <- function(dataframe) {
t <- tapply(dataframe, ID,neg_find)
return(t)
}
df2 <- apply(df, 2, final_result_neg)%>%data.frame()
df2$ID <- NULL
df2[df2==TRUE] <- 'Neg'
df2[df2==FALSE] <- 'Not Done'
df2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | GKi |
| Solution 2 | Arthur Vaz |
