'Duplicate data based on various conditions in SAS
In the following data set I want to remove duplicates based on several conditions:
For Auris disease:
- Same id, same condition (Auris), keep data with the first date only no matter what the date difference is.
For other disease conditions (Acino and CRE):
Same id, same condition, data difference more than 90 days keep data with the first date and last date if we have two dates.
Same id, same condition, data difference more than 90 days keep data with the first date and last date if we have three dates or more. Keep all three if the difference is more than 90 days between 1st and second, and more than 90 days between 2nd and third dates.
data have; input Id Disease $ Date :mmddyy10.; format date mmddyy10.; datalines; 123 Auris 01/01/2021 123 CRE 09/02/2020 344 CRE 08/06/2019 344 CRE 03/06/2020 344 CRE 03/03/2021 323 CRE 01/06/2019 323 CRE 09/06/2020 323 CRE 09/09/2020 167 Acino 03/06/2020 167 Acino 03/19/2020 167 Acino 09/03/2021 256 Auris 08/05/2020 256 Auris 10/07/2021 317 Acino 12/07/2018 317 Acino 01/03/2018 ;;;; run;
Result should be as this:
123 Auris 01/01/2021
123 CRE 09/02/2020
344 CRE 08/06/2019
344 CRE 03/06/2020
344 CRE 03/03/2021
323 CRE 01/06/2019
323 CRE 09/06/2020
167 Acino 03/06/2020
167 Acino 09/03/2021
256 Auris 08/05/2020
256 Auris 10/07/2021
317 Acino 12/07/2018
Thanks
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
