'Calculating the mix of values within a pandas groupby

Within a database I have fields 'DATE', 'country', 'itemID', and 'price'. What I'm trying to do is calculate the mix of items (which sum to 100%) across price bin by date group by country.

The following code calculates the number of objects by country/price bin/date (which is aggregated to weekly frequency) and I'm happy with the results.

bins = [0, 5000, 10000, 20000, 30000, 40000, float('inf')]
labels = ['<5k', '10k', '20k', '30k', '40k', '>40k']
df.groupby(['country', pd.Grouper(key='DATE', freq='1W'), pd.cut(df['price'], bins=bins, labels=labels, include_lowest=True)]).agg({'itemID': 'nunique'}).unstack('DATE')

What I'd like to do is, instead of showing me the number of items in each category, I get the percentage in each price bin for that country on that date.

So, for example:

DATE 2022-01-30 2022-02-06
country price
-------- --------
USA <5k 0% 1%
USA 10k 5% 19%
USA 20k 55% 25%
USA 30k 30% 55%
USA 40k 2% 0%
USA >40k 8% 0%
JAP <5k 0% 1%
JAP 10k 25% 0%
JAP 20k 45% 15%
JAP 30k 20% 35%
JAP 40k 2% 25%
JAP >40k 8% 25%

I tried the below to initially calculate the number of unique itemIDs by country and date and then accessing this in the second part:

def f(d):
    return d['itemID'] / d['nuniques']

df['nuniques'] = df.groupby(['country', pd.Grouper(key='DATE', freq='1W')])['itemID'].transform('nunique')
df.groupby(['country', pd.Grouper(key='DATE', freq='1W'), pd.cut(df['price'], bins=bins, labels=labels, include_lowest=True)]).agg({'itemID': 'nunique', 'nuniques': 'min'}).apply(lambda x: f(x), axis=1).unstack('AS_OF_DATE')

Excluding the cut function, it shows 100% across each country (as expected).

However, adding the cut function causes the sum of the items across the bins to exceed 100% (by <1%, but different amounts for each date).

Why would this be, and how can it be calculated more properly? At a guess, the discrepancy is something to do with the fact that not every itemID has an entry for every date within the weekly window (e.g. if it goes out of stock).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source