'Calculate percentages by multiple columns in python

I need to calculate the share of observations over a multilevel group. Consider the following data:

id_1 = np.array([1,1,1,1,1,1,2,2,2,2]).reshape(-1,1)
id_2 = np.array(['a','a','a','b','b','b','b','c','c','c']).reshape(-1,1)
df = pd.DataFrame(data=np.c_[id_1, id_2], columns=['id_1', 'id_2'])

Now, we need to calculate the share of observations by id_2 so that the percentages add up to 100% for every value of id_1. I managed to get the desired results using this:

cnt_all = df.value_counts().reset_index()
cnt_id_1 = df['id_1'].value_counts().reset_index()
cnt_all.columns = ['id_1', 'id_2', 'cnt']
cnt_id_1.columns = ['id_1', 'cnt']

df_joined = cnt_all.merge(cnt_id_1, how='left', left_on='id_1', right_on='id_1')
df_joined['share'] = df_joined['cnt_x']/df_joined['cnt_y']

However, this solution seems rather clunky to me. Is there a way to do this in python more neatly?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source