'How to create a subset from a dataframe based on last exam status
I would like to know if there is a way to create a subset from a dataframe in python, based on the last exam status of a patient with a corresponding id (one id per patient)
For example, if a certain id has 5 exams (and exam_status can be 1 or 0) I would like to create a new dataframe based only on the last exam status (let's say is 1).The original df has more columns (72 to be exact)

ex: patient id13 has 2 exam status and i want a dataframe with the ids and corresponding only to the last status (either 0 or 1)
How can I do that?
solution (from whe answers!!):
df.groupby("id").last()[list(df.groupby("id")["exam_status"].last() == 1)]
ALSO, how can I create a subset with the patients that changed status (from 0 to 1). I know it's probably a small change in the given solution, but I'm new at learning python! Thank you!!
Solution 1:[1]
This should work:
df.groupby("id").last()[list(df.groupby("id")["exam_status"].last() == 1)]
Assuming your dataframe is df, you filter on id, take the last value, then filter by a list of booleans of: last exam_status == 1.
Mock data:
df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 14, 14, 15],
'exam_status': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1],
'col3': [2, 3, 1, 2, 3, 2, 1, 23, 4, 3 ,1, 24, 5, 6, 3, 6, 1]})
Output for mock data:
#Out:
# exam_status col3
#id
#1 1 2
#2 1 3
#3 1 1
#4 1 2
#5 1 3
#6 1 2
#7 1 1
#8 1 23
#9 1 4
#10 1 3
#12 1 24
#13 1 6
#14 1 6
#15 1 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rawson |
