'How to calculate bed mates and length of time spent together

I am working with some patient data in Pandas that looks roughly like the following

patient_id room_id start_date end_date
1 1001 01-01-2022 04:30:26 05-01-2022 10:14:01
1 1002 05-01-2022 10:14:01 10-01-2022 14:32:04
1 1003 10-01-2022 14:32:04 15-01-2022 08:24:16
1 1005 15-01-2022 08:24:16 22-01-2022 17:28:22
2 1004 03-01-2022 10:15:35 09-01-2022 17:02:11
2 1007 09-01-2022 17:02:11 12-01-2022 19:55:16
2 1003 12-01-2022 19:55:16 15-01-2022 08:24:16
2 1007 15-01-2022 08:24:16 22-01-2022 17:28:22
3 1004 05-01-2022 12:24:15 10-01-2022 18:49:17
3 1003 10-01-2022 18:49:17 12-01-2022 19:55:16
3 1007 12-01-2022 19:55:16 15-01-2022 08:24:16
4 1001 02-01-2022 06:36:41 16-01-2022 10:38:48

I am interested to calculate how many bed mates that patients have had in their respective rooms, and for how long for (in hours and days).

The data needs to be sorted by start_date to try and understand chronologically where patients started moving from and to.

So far I have tried df.groupby('room_id').start_date.diff(), but didn't quite get the results I expected.

How best can I approach this problem?



Solution 1:[1]

First of all, let's start by creating the dataframe OP mentions

import pandas as pd

patient_df = pd.DataFrame( { "patient_id": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4], "room_id": [1001, 1002, 1003, 1005, 1004, 1007, 1003, 1007, 1004, 1003, 1007, 1001], "start_date": ["01-01-2022 04:30:26", "05-01-2022 10:14:01", "10-01-2022 14:32:04", "15-01-2022 08:24:16", "03-01-2022 10:15:35", "09-01-2022 17:02:11", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "05-01-2022 12:24:15", "10-01-2022 18:49:17", "12-01-2022 19:55:16", "02-01-2022 06:36:41"], "end_date": ["05-01-2022 10:14:01", "10-01-2022 14:32:04", "15-01-2022 08:24:16", "22-01-2022 17:28:22", "09-01-2022 17:02:11", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "22-01-2022 17:28:22", "10-01-2022 18:49:17", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "16-01-2022 10:38:48"] } )

Now, in order to calculate how many bed mates the patients have had in their respective rooms for how long for (in hours and days), the following will do the work

newdf = patient_df.groupby(["patient_id", "room_id"]).agg({"start_date": "min", "end_date": "max"}).reset_index()

If one wants to sort the dataframe newdf by start_date starting from the earliest date, the following will do the work

newdf.sort_values(by=["start_date"], inplace=True)

This is the output of the final dataframe

print(newdf)

[Out]:
                             start_date             end_date
patient_id room_id                                          
1          1001     01-01-2022 04:30:26  05-01-2022 10:14:01
4          1001     02-01-2022 06:36:41  16-01-2022 10:38:48
2          1004     03-01-2022 10:15:35  09-01-2022 17:02:11
1          1002     05-01-2022 10:14:01  10-01-2022 14:32:04
3          1004     05-01-2022 12:24:15  10-01-2022 18:49:17
2          1007     09-01-2022 17:02:11  22-01-2022 17:28:22
1          1003     10-01-2022 14:32:04  15-01-2022 08:24:16
3          1003     10-01-2022 18:49:17  12-01-2022 19:55:16
2          1003     12-01-2022 19:55:16  15-01-2022 08:24:16
3          1007     12-01-2022 19:55:16  15-01-2022 08:24:16
1          1005     15-01-2022 08:24:16  22-01-2022 17:28:22

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1