'How to determine if a test was done how many years consecutively for individuals using pandas?
how do i determine if a certain population has met the conditions of having a test done consecutively for a few years based on their first test date? A example dataset can be found below. I'm thinking of using dataset 1 as df1 and Dataset 2 as df2 but my problem is I'm not sure how to use the first date of collection for different IDs to minus the different collection date of the same IDs?
Dataset 1:
| ID | Date of Collection | Test Done |
|---|---|---|
| My-ID 00001 | 10/05/2016 | 1A |
| My-ID 00001 | 10/01/2017 | 1A |
| My-ID 00001 | 23/01/2018 | 1A |
| My-ID 00001 | 18/04/2019 | 1A |
| My-ID 00001 | 30/04/2020 | 1A |
| My-ID 00002 | 30/09/2015 | 1A |
| My-ID 00002 | 31/05/2016 | 1A |
| My-ID 00002 | 31/05/2017 | 1A |
| My-ID 00003 | 31/05/2017 | 1A |
Dataset 2:
| ID | Test Done | Result | Date of Collection |
|---|---|---|---|
| My-ID 00001 | 1A | 50 | 10/05/2016 |
| My-ID 00002 | 1A | 75 | 30/09/2015 |
Desired Outcome:
| ID | Date of Collection | Test Done | Year since first collection date |
|---|---|---|---|
| My-ID 00001 | 10/05/2016 | 1A | 0 |
| My-ID 00001 | 10/01/2017 | 1A | 1 |
| My-ID 00001 | 23/01/2018 | 1A | 2 |
| My-ID 00001 | 18/04/2019 | 1A | 3 |
| My-ID 00001 | 30/04/2020 | 1A | 4 |
| My-ID 00002 | 30/09/2015 | 1A | 0 |
| My-ID 00002 | 31/05/2016 | 1A | 1 |
Solution 1:[1]
If possible use first year per groups substract it grom first values per groups by GroupBy.transform with GroupBy.first:
df1['Date of Collection'] = pd.to_datetime(df1['Date of Collection'], dayfirst=True)
y = df1['Date of Collection'].dt.year
df1['Year since first collection date'] = y.sub(y.groupby(df['ID']).transform('first'))
print (df1)
ID Date of Collection Test Done Year since first collection date
0 My-ID 00001 2016-05-10 1A 0
1 My-ID 00001 2017-01-10 1A 1
2 My-ID 00001 2018-01-23 1A 2
3 My-ID 00001 2019-04-18 1A 3
4 My-ID 00001 2020-04-30 1A 4
5 My-ID 00002 2015-09-30 1A 0
6 My-ID 00002 2016-05-31 1A 1
7 My-ID 00003 2017-05-31 1A 0
If need processing first values from df2 add left join before solution with DataFrame.merge:
df1['Date of Collection'] = pd.to_datetime(df1['Date of Collection'], dayfirst=True)
df2['Date of First Collection'] = pd.to_datetime(df2['Date of First Collection'], dayfirst=True)
y = df1.merge(df2, on=['ID','Test Done'], how='left')['Date of First Collection'].dt.year
df1['Year since first collection date'] = y.sub(y.groupby(df['ID']).transform('first'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
