'r filter by group multiple critera
I have a dataset with student scores. There are a lot of repeat rows per student ID like this below.
ID Date Score Source
1 2016-02-24 19.2 A
2 2020-01-08 16.6 B
3 2021-01-25 18.1 A
3 2021-01-25 16.2 C
4 2011-02-28 13.2 A
4 2011-02-28 17.4 A
5 2011-02-28 19.2 A
5 2011-02-28 14.6 C
6 2016-04-16 11.2 C
6 2016-04-16 12.4 C
My goal is to exclude some repeat observations and retain only observations based on this criteria.
Rule 1 : Same ID, Same date, different source. Retain only rows where Source = A Example ID 3, 5 , retain row
3 2021-01-25 18.1 A
5 2011-02-28 19.2 A
Rule 2 : Same ID, Same date, same source. Retain rows with max of score.
4 2011-02-28 17.4 A
6 2016-04-16 12.4 C
The final expected dataset
ID Date Score Source
1 2016-02-24 19.2 A
2 2020-01-08 16.6 B
3 2021-01-25 18.1 A
4 2011-02-28 17.4 A
5 2011-02-28 19.2 A
6 2016-04-16 12.4 C
I am aware of groupby and filter but I am not sure how to apply those functions in this situation. Any suggestion is much appreciated thanks.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
