'How to calculate bed mates and length of time spent together

I am working with some patient data in Pandas that looks roughly like the following

patient_id	room_id	start_date	end_date
1	1001	01-01-2022 04:30:26	05-01-2022 10:14:01
1	1002	05-01-2022 10:14:01	10-01-2022 14:32:04
1	1003	10-01-2022 14:32:04	15-01-2022 08:24:16
1	1005	15-01-2022 08:24:16	22-01-2022 17:28:22
2	1004	03-01-2022 10:15:35	09-01-2022 17:02:11
2	1007	09-01-2022 17:02:11	12-01-2022 19:55:16
2	1003	12-01-2022 19:55:16	15-01-2022 08:24:16
2	1007	15-01-2022 08:24:16	22-01-2022 17:28:22
3	1004	05-01-2022 12:24:15	10-01-2022 18:49:17
3	1003	10-01-2022 18:49:17	12-01-2022 19:55:16
3	1007	12-01-2022 19:55:16	15-01-2022 08:24:16
4	1001	02-01-2022 06:36:41	16-01-2022 10:38:48

I am interested to calculate how many bed mates that patients have had in their respective rooms, and for how long for (in hours and days).

The data needs to be sorted by start_date to try and understand chronologically where patients started moving from and to.

So far I have tried df.groupby('room_id').start_date.diff(), but didn't quite get the results I expected.

How best can I approach this problem?

python pandas

Solution 1:^[1]

First of all, let's start by creating the dataframe OP mentions

import pandas as pd

patient_df = pd.DataFrame( { "patient_id": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4], "room_id": [1001, 1002, 1003, 1005, 1004, 1007, 1003, 1007, 1004, 1003, 1007, 1001], "start_date": ["01-01-2022 04:30:26", "05-01-2022 10:14:01", "10-01-2022 14:32:04", "15-01-2022 08:24:16", "03-01-2022 10:15:35", "09-01-2022 17:02:11", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "05-01-2022 12:24:15", "10-01-2022 18:49:17", "12-01-2022 19:55:16", "02-01-2022 06:36:41"], "end_date": ["05-01-2022 10:14:01", "10-01-2022 14:32:04", "15-01-2022 08:24:16", "22-01-2022 17:28:22", "09-01-2022 17:02:11", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "22-01-2022 17:28:22", "10-01-2022 18:49:17", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "16-01-2022 10:38:48"] } )

Now, in order to calculate how many bed mates the patients have had in their respective rooms for how long for (in hours and days), the following will do the work

newdf = patient_df.groupby(["patient_id", "room_id"]).agg({"start_date": "min", "end_date": "max"}).reset_index()

If one wants to sort the dataframe newdf by start_date starting from the earliest date, the following will do the work

newdf.sort_values(by=["start_date"], inplace=True)

This is the output of the final dataframe

print(newdf)

[Out]:
                             start_date             end_date
patient_id room_id                                          
1          1001     01-01-2022 04:30:26  05-01-2022 10:14:01
4          1001     02-01-2022 06:36:41  16-01-2022 10:38:48
2          1004     03-01-2022 10:15:35  09-01-2022 17:02:11
1          1002     05-01-2022 10:14:01  10-01-2022 14:32:04
3          1004     05-01-2022 12:24:15  10-01-2022 18:49:17
2          1007     09-01-2022 17:02:11  22-01-2022 17:28:22
1          1003     10-01-2022 14:32:04  15-01-2022 08:24:16
3          1003     10-01-2022 18:49:17  12-01-2022 19:55:16
2          1003     12-01-2022 19:55:16  15-01-2022 08:24:16
3          1007     12-01-2022 19:55:16  15-01-2022 08:24:16
1          1005     15-01-2022 08:24:16  22-01-2022 17:28:22

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How to calculate bed mates and length of time spent together

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]