'Iterate over pandas dataframe using value in separate dataframe, filtered by shared column
I have two dataframes in the following form:
df1
| id | name | df2_id |
|---|---|---|
| one | foo | template_x |
| two | bar | template_y |
| three | baz | template_z |
df2
| id | name | value |
|---|---|---|
| template_x | aaa | zzz |
| template_x | bbb | yyy |
| template_y | ccc | xxx |
| template_y | ddd | www |
| template_z | eee | vvv |
| template_z | fff | uuu |
For each value in df1 where df2_id == df2.id, I'd like to iterate over df2 and append the value of df1.id to name and value in each row to get:
df3
| id | concat_name | concat_val |
|---|---|---|
| template_x | aaa_one | zzz_one |
| template_x | bbb_one | yyy_one |
| template_y | ccc_two | xxx_two |
| template_y | ddd_two | www_two |
| template_z | eee_three | vvv_three |
| template_z | fff_three | uuu_three |
Constraints/caveats:
- All relevant values are strings, no integers.
- Sometimes
df2.valueis empty, and I would like to keep it empty.
My approach was to use nested for loop with df.iterrows, but it's giving me trouble.
Solution 1:[1]
Here's a pretty sleek one-liner:
df2[['name', 'value']] = df2[['name', 'value']].T.add('_' + df1.set_index('df2_id').loc[df2['id'], 'id'].reset_index(drop=True)).T
Output:
>>> df2
id name value
0 template_x aaa_one zzz_one
1 template_x bbb_one yyy_one
2 template_y ccc_two xxx_two
3 template_y ddd_two www_two
4 template_z eee_three vvv_three
5 template_z fff_three uuu_three
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
