'Pandas - grouping by multiple columns and sorting with one of them
My data consists of ridership data for a fictional bike sharing company.
- My data is contained in a dataframe df
- member_type has two values - member or casual
- day_of_week has seven values - each day of the week
- ride_duration is how long a ride lasts for one session
I want to compare the average ride_duration for each member_type depending on day of the week.
df.groupby(['member_type', 'day_of_week'])['ride_duration'].mean()
This code gives me what I want except that the day_of_week is in alphabetical order and not chronological.
I tried the code below, but I think that since I'm grouping by two columns the reindex doesn't work as I hoped to.
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean().reindex(cats)
Additionally, if I were to sort by both member_type and day_of_week, how would I proceed? In this example I only have two member types, but say if I had 10 and wanted to sort them as well. Thanks in advance!
Solution 1:[1]
You can try reindex with level parameter:
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean().reindex(cats, level=1)
However, I would suggest changing the column day_of_week to CategoricalType before doing groupby so that you don't have to reindex later. Similarly you can convert the member_type column to ordered categorical type:
df['day_of_week'] = pd.Categorical(df['day_of_week'], cats, ordered=True)
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shubham Sharma |
