'merge dataframes and add price data for each instance of an item ID
I am trying to merge two dataframes so that each instance of an item ID in DF3 displays the pricing data associated with the matching ID from DF1.
DF3 (what I am trying to accomplish)
| recipeID | itemID_out | qty_out | buy_price | sell_price | buy_quantity | sell_quantity | id_1_in | qty_id1 | buy_price | sell_price | buy_quantity | sell_quantity | id_2_in | qty_id2 | buy_price | sell_price | buy_quantity | sell_quantity | id_3_in | qty_id3 | buy_price | sell_price | buy_quantity | sell_quantity | id_4_in | qty_id4 | buy_price | sell_price | buy_quantity | sell_quantity | id_5_in | qty_id5 | buy_price | sell_price | buy_quantity | sell_quantity |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1986 | 1 | 129 | 167 | 67267 | 21637 | 123 | 1 | 10 | 15 | 1500 | 3000 | 124 | 1 | 12 | 14 | 550 | 800 | 125 | 1 | 8 | 12 | 124 | 254 | 126 | 1 | 22 | 25 | 1251 | 890 | 127 | 1 | 64 | 72 | 12783 | 1251515 |
| 2 | 1987 | 1 | 1521 | 1675 | 654 | 1245 | 123 | 2 | 10 | 15 | 1500 | 3000 | ||||||||||||||||||||||||
| 3 | 1988 | 1 | 128376 | 131429 | 47 | 23 | 123 | 10 | 10 | 15 | 1500 | 3000 | 124 | 3 | 12 | 14 | 550 | 800 |
These are the two dataframes I am trying to merge from.
DF1: Contains 26863 rows; master list of item names, IDs, and price data. Pulled from API, new items can be added and will appear as new rows after an update request from the user.
| itemID | name | buy_price | sell_price | buy_quantity | sell_quantity |
|---|---|---|---|---|---|
| 1986 | XYZ | 129 | 167 | 67267 | 21637 |
| 123 | ABC | 10 | 15 | 1500 | 3000 |
| 124 | DEF | 12 | 14 | 550 | 800 |
DF2 (contains 12784 rows; recipes that combine from items in the master list. Pulled from API, new recipes can be added and will appear as new rows after an update request from the user.)
| recipeID | itemID_out | qty_out | id_1_in | qty_id1 | id_2_in | qty_id2 | id_3_in | qty_id3 | id_4_in | qty_id4 | id_5_in | qty_id5 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1986 | 1 | 123 | 1 | 124 | 1 | 125 | 1 | 126 | 1 | 127 | 1 |
| 2 | 1987 | 1 | 123 | 2 | ||||||||
| 3 | 1988 | 1 | 123 | 10 | 124 | 3 |
Recipes can contain a combination of 1 to 5 items (null values occur) that consist of IDs from DF1 and/or the itemID_out column in DF2.
The "id_#_in" columns in DF2 can contain item IDs from the "itemID_out" column, due to that recipe using the item that is being output from another recipe.
I have tried to merge it using:
pd.merge(itemlist_modified, recipelist_modified, left_on='itemID', right_on='itemID_out')
But this only ever results in a single column of ideas receiving the pricing data as intended.
I feel like I'm trying to use the wrong function for this, any help would be very much appreciated!
Thanks in advance!
Solution 1:[1]
Not a pretty approach, but it first melts the ingredient table into long form and then merges it on the itemlist table
import pandas as pd
import numpy as np
itemlist_modified = pd.DataFrame({
'itemID': [1986, 123, 124],
'name': ['XYZ', 'ABC', 'DEF'],
'buy_price': [129, 10, 12],
'sell_price': [167, 15, 14],
'buy_quantity': [67267, 1500, 550],
'sell_quantity': [21637, 3000, 800],
})
recipelist_modified = pd.DataFrame({
'RecipeID': [1, 2, 3],
'itemID_out': [1986, 1987, 1988],
'qty_out': [1, 1, 1],
'id_1_in': [123, 123, 123],
'qty_id1': [1, 2, 10],
'id_2_in': [124.0, np.nan, 124.0],
'qty_id2': [1.0, np.nan, 3.0],
'id_3_in': [125.0, np.nan, np.nan],
'qty_id3': [1.0, np.nan, np.nan],
'id_4_in': [126.0, np.nan, np.nan],
'qty_id4': [1.0, np.nan, np.nan],
'id_5_in': [127.0, np.nan, np.nan],
'qty_id5': [1.0, np.nan, np.nan],
})
#columns which are not qty or input id cols
id_vars = ['RecipeID','itemID_out','qty_out']
#prepare dict to map column name to ingredient number
col_renames = {}
col_renames.update({'id_{}_in'.format(i+1):'ingr_{}'.format(i+1) for i in range(5)})
col_renames.update({'qty_id{}'.format(i+1):'ingr_{}'.format(i+1) for i in range(5)})
#melt reciplist into longform
long_recipelist = recipelist_modified.melt(
id_vars=id_vars,
var_name='ingredient',
).dropna()
#add a new column to specify whether each row is a qty or an id
long_recipelist['kind'] = np.where(long_recipelist['ingredient'].str.contains('qty'),'qty_in','id_in')
#convert ingredient names
long_recipelist['ingredient'] = long_recipelist['ingredient'].map(col_renames)
#pivot on the new ingredient column
reshape_recipe_list = long_recipelist.pivot(
index=['RecipeID','itemID_out','qty_out','ingredient'],
columns='kind',
values='value',
).reset_index()
#merge with the itemlist
priced_ingredients = pd.merge(reshape_recipe_list, itemlist_modified, left_on='id_in', right_on='itemID')
#pivot on the priced ingredients
priced_ingredients = priced_ingredients.pivot(
index = ['RecipeID','itemID_out','qty_out'],
columns = 'ingredient',
)
#flatten the hierarchical columns
priced_ingredients.columns = ["_".join(a[::-1]) for a in priced_ingredients.columns.to_flat_index()]
priced_ingredients.columns.name = ''
priced_ingredients = priced_ingredients.reset_index()
priced_ingredients partial output:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mitoRibo |

