'How to create a subset from a dataframe based on last exam status

I would like to know if there is a way to create a subset from a dataframe in python, based on the last exam status of a patient with a corresponding id (one id per patient)

For example, if a certain id has 5 exams (and exam_status can be 1 or 0) I would like to create a new dataframe based only on the last exam status (let's say is 1).The original df has more columns (72 to be exact)

example

ex: patient id13 has 2 exam status and i want a dataframe with the ids and corresponding only to the last status (either 0 or 1)

How can I do that?

solution (from whe answers!!):

    df.groupby("id").last()[list(df.groupby("id")["exam_status"].last() == 1)]

ALSO, how can I create a subset with the patients that changed status (from 0 to 1). I know it's probably a small change in the given solution, but I'm new at learning python! Thank you!!



Solution 1:[1]

This should work:

df.groupby("id").last()[list(df.groupby("id")["exam_status"].last() == 1)]

Assuming your dataframe is df, you filter on id, take the last value, then filter by a list of booleans of: last exam_status == 1.

Mock data:

df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 14, 14, 15],
                   'exam_status': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1],
                   'col3': [2, 3, 1, 2, 3, 2, 1, 23, 4, 3 ,1, 24, 5, 6, 3, 6, 1]})

Output for mock data:

#Out: 
#    exam_status  col3
#id                   
#1             1     2
#2             1     3
#3             1     1
#4             1     2
#5             1     3
#6             1     2
#7             1     1
#8             1    23
#9             1     4
#10            1     3
#12            1    24
#13            1     6
#14            1     6
#15            1     1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rawson