'How to calculate bed mates and length of time spent together
I am working with some patient data in Pandas that looks roughly like the following
patient_id | room_id | start_date | end_date |
---|---|---|---|
1 | 1001 | 01-01-2022 04:30:26 | 05-01-2022 10:14:01 |
1 | 1002 | 05-01-2022 10:14:01 | 10-01-2022 14:32:04 |
1 | 1003 | 10-01-2022 14:32:04 | 15-01-2022 08:24:16 |
1 | 1005 | 15-01-2022 08:24:16 | 22-01-2022 17:28:22 |
2 | 1004 | 03-01-2022 10:15:35 | 09-01-2022 17:02:11 |
2 | 1007 | 09-01-2022 17:02:11 | 12-01-2022 19:55:16 |
2 | 1003 | 12-01-2022 19:55:16 | 15-01-2022 08:24:16 |
2 | 1007 | 15-01-2022 08:24:16 | 22-01-2022 17:28:22 |
3 | 1004 | 05-01-2022 12:24:15 | 10-01-2022 18:49:17 |
3 | 1003 | 10-01-2022 18:49:17 | 12-01-2022 19:55:16 |
3 | 1007 | 12-01-2022 19:55:16 | 15-01-2022 08:24:16 |
4 | 1001 | 02-01-2022 06:36:41 | 16-01-2022 10:38:48 |
I am interested to calculate how many bed mates that patients have had in their respective rooms, and for how long for (in hours and days).
The data needs to be sorted by start_date to try and understand chronologically where patients started moving from and to.
So far I have tried df.groupby('room_id').start_date.diff()
, but didn't quite get the results I expected.
How best can I approach this problem?
Solution 1:[1]
First of all, let's start by creating the dataframe OP mentions
import pandas as pd
patient_df = pd.DataFrame( { "patient_id": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4], "room_id": [1001, 1002, 1003, 1005, 1004, 1007, 1003, 1007, 1004, 1003, 1007, 1001], "start_date": ["01-01-2022 04:30:26", "05-01-2022 10:14:01", "10-01-2022 14:32:04", "15-01-2022 08:24:16", "03-01-2022 10:15:35", "09-01-2022 17:02:11", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "05-01-2022 12:24:15", "10-01-2022 18:49:17", "12-01-2022 19:55:16", "02-01-2022 06:36:41"], "end_date": ["05-01-2022 10:14:01", "10-01-2022 14:32:04", "15-01-2022 08:24:16", "22-01-2022 17:28:22", "09-01-2022 17:02:11", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "22-01-2022 17:28:22", "10-01-2022 18:49:17", "12-01-2022 19:55:16", "15-01-2022 08:24:16", "16-01-2022 10:38:48"] } )
Now, in order to calculate how many bed mates the patients have had in their respective rooms for how long for (in hours and days), the following will do the work
newdf = patient_df.groupby(["patient_id", "room_id"]).agg({"start_date": "min", "end_date": "max"}).reset_index()
If one wants to sort the dataframe newdf
by start_date
starting from the earliest date, the following will do the work
newdf.sort_values(by=["start_date"], inplace=True)
This is the output of the final dataframe
print(newdf)
[Out]:
start_date end_date
patient_id room_id
1 1001 01-01-2022 04:30:26 05-01-2022 10:14:01
4 1001 02-01-2022 06:36:41 16-01-2022 10:38:48
2 1004 03-01-2022 10:15:35 09-01-2022 17:02:11
1 1002 05-01-2022 10:14:01 10-01-2022 14:32:04
3 1004 05-01-2022 12:24:15 10-01-2022 18:49:17
2 1007 09-01-2022 17:02:11 22-01-2022 17:28:22
1 1003 10-01-2022 14:32:04 15-01-2022 08:24:16
3 1003 10-01-2022 18:49:17 12-01-2022 19:55:16
2 1003 12-01-2022 19:55:16 15-01-2022 08:24:16
3 1007 12-01-2022 19:55:16 15-01-2022 08:24:16
1 1005 15-01-2022 08:24:16 22-01-2022 17:28:22
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |