'Remove duplicate rows based on value of another column

I'm an R newbie and this is my first SO post (but a long-time user), so sorry if this is a dumb question. Thanks in advance for any assistance.

I've got a large dataset with more than 50 columns. Human error results in some items entered twice, but with different information in a key variable. I have reduced this to a two-column problem for simplicity: teacher-class number (tc_num), exam status (x_gen).

I can't share the actual dataset, unfortunately, but here is essentially what I have:

tc_num	x_gen
12355	N
12355	Y
26421	Y
26421	N
78943	N
45679	Y

In the case of duplicate tc_num values (e.g., 12355, 26421), I want to select the row with the "Y" value and discard the "N" value However, most tc_num values are unique (e.g., 78943, 45679), and I want to keep all of those rows (in other words, I can't just discard all rows where x_gen = "N").

So, I want to keep all rows UNLESS there is a duplicate tc_num value, in which case I want to keep the one with the "Y" value.

Thanks in advance. I appreciate this community, as it's been a big help to me over the years.

r duplicates

Solution 1:^[1]

subset(df, x_gen == "Y" | ave(tc_num, tc_num, FUN = length) == 1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Remove duplicate rows based on value of another column

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]