'Create new columns based on column values

so I have the following dataframe. In essence it gives me the participation of two commodities (commodity 55 and 73) relative to the world's trade value, that for every country in the world. What I need is to create two new columns that give me the participation of commodity 55 and commodity 73 for each country (given that those are the commodities I have in the data). I would name those columns commodity_p55 and commodity_p73, for example. Thank you very much.

The dataframe looks like this:

enter image description here



Solution 1:[1]

Your desired output is not clear entirely. However I can suggest the following based on the snippet of the table you have shared.

With df as your dataframe, use Numpy's np.where() to create and update the new columns 'commodity_p55' and 'commodity_p73'. Where there are no values corresponding to the condition, the np.where() clause will insert a 0, as specified in the code below. You can change this to whatever you would like it to be. Alternatively, you can use df.loc()

df['commodity_p55'] = np.where(df['commodity_code'] == 55, df['participation'], 0) 
df['commodity_p73'] = np.where(df['commodity_code'] == 73, df['participation'], 0) 

Then use groupby, to group data by each country with corresponding participation values in the 'commodity_p55' and 'commodity_p73'. Groupby as many columns as you would like - I have demonstrated with just the 'country_name' and use a sum() function to aggregate - if all values appear once, the sum will simply return those values once grouped.

df.groupby('country_name').sum()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 alkes