'Groupby by sum of revenue and the corresponding highest contributing month - Pandas

I have a bill details data set and I want to do a groupby of the products based on the sum of their Total value, additionally i want a column which indicates the month which has produced the most revenue for the corresponding product

Data set:

Bill_Id  Month  Product_Id  Net_Value
 1        1       20          100
 2        1       20          100
 3        2       20          100
 4        1       30          200
 5        2       30          200
 6        2       30          200

Desired_Result

Product_Id  Total_revenue  Top_Month
   20          300            1
   30          600            2

This just a sample dataset I have the transaction data of the entire year

Solution 1:^[1]

Assuming that your only 1 Top month value is needed based on the maximum sum of Net_Revenue, Below is the code that might work for you.

We can achieve this in 3 stages as mentioned below:

1. Extracting the sum of net revenue based on product id

df_1 = df.groupby(['Product_Id']).agg({'Net_Value' : sum}).reset_index()
df_1 = df_1.rename(columns={'Net_Value' : 'Total_revenue'})

print(df_1)
Product_Id  Total_revenue
0   20  300
1   30  600

2. Extracting the best contibuting month based on max sum net revenue for each product id

df_2 = df.groupby(['Product_Id', 'Month']).agg({'Net_Value' : sum}).sort_values('Net_Value', ascending=False).reset_index()
df_2 = df_2.drop_duplicates(subset=['Product_Id'])[['Product_Id', 'Month']]

print(df_2)
Product_Id  Month
0   30  2
1   20  1

3. Final step is to merge this both dataframes into single based on product id

final_df = df_1.merge(df_2)

print(final_df)
Product_Id  Total_revenue  Month
0          20            300      1
1          30            600      2

Please do upvote the solution if it helps :)

Solution 2:^[2]

Small modification over @Shubham's approach

result = (
    df.pivot_table("Net_Value", "Product_Id", "Month", aggfunc="sum")
    .agg(["sum", "idxmax"], axis=1)
    .set_axis(["Total_revenue", "Top_Month"], axis=1)
)

Solution 3:^[3]

As multiple columns are being interacted, I have used the apply function in addition to groupby:

Net_value is calucated using basic aggregate function sum

Top_month required interaction between columns so, first get the index of max Net_value using idxmax then using loc to find the month

The resultant Pandas Series object has the groupby column (Product_id) as index, so it make it a column I have used reset_index

def f(x):
    d = {}
    d['Net_Value'] = x['Net_Value'].sum()
    d['Top_month'] = df.loc[x['Net_Value'].idxmax(), "Month"]
    return pd.Series(d, index=['Net_Value', 'Top_month'])

df.groupby('Product_Id').apply(f).reset_index()

# Output

  Product_Id    Net_Value   Top_month
0   20             300         1
1   30             600         2

Check out this amazing answer which helped me and can help you in the future as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Mark Wang
Solution 3	Uchiha012

'Groupby by sum of revenue and the corresponding highest contributing month - Pandas

Solution 1:[1]