'Pandas dataframe Group by Time Interval and then ID with sum of Counts

I'm trying to group a dataset by time first and then group by ID using pandas, while summing the counts. My data looks something along the lines of this:

id,selected time,count
1,5/16/2022 3:58:06 PM,1
1,5/16/2022 3:55:10 PM,1
2,5/16/2022 3:52:01 PM,2
3,5/16/2022 3:19:33 PM,1
3,5/16/2022 3:15:04 PM,1
4,5/16/2022 3:12:38 PM,1
1,5/16/2022 2:42:58 PM,1
1,5/16/2022 2:26:13 PM,1
2,5/16/2022 2:21:02 PM,1
5,5/16/2022 2:18:21 PM,1
4,5/16/2022 2:15:18 PM,1

I'm trying to get my data to look something along the lines of this:

id,5/16/2022 2:00:00 PM,5/16/2022 3:00:00 PM
1,2,2
2,2,1
3,2,0
4,1,1
5,1,0

Of course, this is a subset of the data, and the whole dataset encompasses many more ids over a 24 hour time period.

Among many other methods, I've tried this:

df = df.groupby('id') \
    .resample('60min', on='selected time')['count']
    .sum() \
    .unstack(1, fill_value=0) \
    .reset_index(level=0)

But this method does not return the correct values (as it is grouping by id first and then time interval), and other methods I've tried either throw an error or also have the same problem.

I am new to pandas, so I am still learning. Any help would be greatly appreciated, Thanks!



Solution 1:[1]

first, a new columns is created where minute and seconds were made to zero, by flooring the hour. Then the Pivot_table gives the required result

df['selected_time_2'] = df['selected time'].astype('datetime64').dt.floor('h').dt.strftime('%m/%d/%YY %I:%M:%S %p')

df.pivot_table(index='id',columns='selected_time_2', values='count', aggfunc='sum').fillna('').reset_index() 
selected_time_2     id  05/16/2022Y 02:00:00 PM     05/16/2022Y 03:00:00 PM
0   1   2   2
1   2   1   2
2   3       2
3   4   1   1
4   5   1   

EDIT: To show the columns in 12 hrs format

import datetime

df['selected_time_2'] = df['selected time'].astype('datetime64').dt.floor('h')  #.dt.strftime('%m/%d/%Y %I:%M:%S %p').astype('datetime64')

df2 = df.pivot_table(index='id',columns='selected_time_2', values='count', aggfunc='sum').fillna('').reset_index() 

df2.columns = [col.strftime('%m/%d/%Y %I:%M:%S %p') 
               if (isinstance(col, datetime.date)  )
               else col
               for col in df2.columns]
df2

RESULT:

    id  05/16/2022 11:00:00 AM  05/16/2022 12:00:00 PM  05/16/2022 02:00:00 PM  05/16/2022 03:00:00 PM
0   1   1.0     1.0     2.0     
1   2           1.0     2.0
2   3               2.0
3   4           1.0     1.0
4   5           1.0     

aggfunc added to get the sum, missed the first time around in the response PS: I'm not sure how to format the result properly, and can take advise on it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1