'Pandas Group by index Hour and keeping observation for each hour

I have a pandas dataframe containing one column and a datetime index, i need to group the data by hour and keep each obsevation (record) for each of the grouped by hours, the following methods allow groupping the data:

data= data.groupby(data.index.hour).max()

The issue is that the previous method allows a groupping using an aggregator (max, mean or min), while i need to keep each of the observations.



Solution 1:[1]

If you group by the hour, then you'll just get 24 unique records, one for each hour in the day and not across the days within your data.

If your intention is to aggregate multiple observations that occur each hour and to view them over time, you'll want to use the pandas.Grouper class.

In either case, you can apply list to keep all of the observations within each group individually, instead of aggregating.

Note: You'll need to replace "data_col" with the name of the data column in your DF that contains the observations

Grouping to hour number

data = data.groupby(data.index.hour)["data_col"].apply(list)

Grouping to hourly time sequence

time_grouper = pd.Grouper(freq="1h")

data = data.groupby(time_grouper)["data_col"].apply(list)

Solution 2:[2]

Alternative to GroupBy

You can also use Panda's resample function to up or downsample to a desired frequency if your index is datetime format.

# set 'date' column as index assuming it isn't already. Make sure it is in datetime format
data= data.set_index('date')

# resample your data to hourly frequency by taking the sum of all values within each hour. 
data.resample('H').sum()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Chris Farr
Solution 2 Gene Burinsky