'Python automation, filling in one dataframe based on another dataframe
I am writing an automation script for work and the key feature of it is hanging me up. I need to fill in one spreadsheet based on information from another spreadsheet. I've created a simplified scenario that replicates my problem.
One sheet, my reference sheet, has the lookup information spread among several columns:
https://www.dropbox.com/s/wby8yyljkyz1o74/lookup_ref.csv?dl=0
The fill-in sheet needs to have the relevant number filled in based on another matching value:
https://www.dropbox.com/s/h9knreacqamjvbg/fill_in_d_number.csv?dl=0
Here's my code:
import pandas as pd
df_ref = pd.read_csv('lookup_ref.csv')
df_fillin = pd.read_csv('fill_in_d_number.csv')
def get_d_number(food):
return df_ref[df_ref.eq(food).any(1)].d_number.item()
df_fillin.insert(0, 'd_number', get_d_number(df_fillin['food']))
and I'm getting this:
ValueError: can only convert an array of size 1 to a Python scalar
Can anyone help with this? It seems like something that should be really simple, and if it worked it would save me a LOT of work.
Solution 1:[1]
I am not sure if I really understand what you are trying to do. What would you have given as input and what do you want the output DataFrame to look like?
How I understand it: You have a lookup DataFrame and a fill-in DataFrame and you want to add a column 'd_number' into the fill-in DataFrame where the value of each row is the 'd_number' value of the row in the lookup DataFrame where any column matches the lookup DataFrames 'food' value in that row.
import pandas as pd
import numpy as np
lookup = pd.DataFrame({"d_number": [1, 2, 3, 4, 5],
"a": ["apple", np.NaN, np.NaN, np.NaN, np.NaN],
"b": [np.NaN, "banana", np.NaN, np.NaN, np.NaN],
"c": ["steak", "peanut", np.NaN, np.NaN, np.NaN],
"d": ["bread", np.NaN, np.NaN, "sausage", np.NaN],
})
fillin = pd.DataFrame({"food": ["apple", "banana", "steak", "peanut", "bread", "sausage"]})
One quick and dirty way to solve this would be as follows:
1 - Merge the four columns into a list column, skipping NA-values.
lookup["merged"] = [[v for v in row if v==v] for row in lookup[["a", "b", "c", "d"]].values.tolist()]
2 - Iterate over a flattened version of that list column and that rows d_number, assign the d_number as value to this food as key in a dictionary.
mapping = {val: id_ for id_, row in zip(lookup["d_number"], lookup["merged"]) for val in row}
3 - Use the resulting mapping dictionary to create a new column by looking up the d_number for each food.
fillin["d_number"] = fillin["food"].apply(lambda f: mapping[f])
Which results in this output:
food d_number
0 apple 1
1 banana 2
2 steak 1
3 peanut 2
4 bread 1
5 sausage 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ewz93 |
