'Join with Concact to Create a Tuple Column in Pandas
I have two pandas dataframes and I need to do a join by the external_id field to get the product_id and to insert in the first dataframe according to the column external_id, the detail is that I need to create a tuple, because a column external_id can have more than one product.
That is, the expected output of this new column should be, for example: (1, 2, 3)
So I'll know that that external_id has more than one product.
Dataframe 1 - who will receive the join:
| id | external_id | column1 | column2 |
|---|---|---|---|
| 1 | a43505 | Example | 1 |
| 2 | 11b737 | Example | 1 |
| 3 | 3 | Example | 1 |
| 4 | lb22 | Example | 1 |
| 5 | 2 | Example | 1 |
Dataframe 2 - Dimension:
| product_id | external_id | product_name |
|---|---|---|
| 1 | a43505 | Product 1 |
| 2 | c911d8 | Product 2 |
| 3 | 11b737 | Product 3 |
| 4 | a43505 | Product 4 |
| 5 | 5b1381 | Product 5 |
| 6 | a43505 | Product 6 |
Expected:
| id | external_id | column1 | column2 | product_id |
|---|---|---|---|---|
| 1 | a43505 | Example | 1 | (1, 4, 6) |
| 2 | 11b737 | Example | 1 | (3) |
| 3 | 5b1381 | Example | 1 | (5) |
| 4 | lb22 | Example | 1 | () |
| 5 | 2 | Example | 1 | () |
Solution 1:[1]
You could use map:
df1["product_id"] = df1["external_id"].map(df2.groupby("external_id")["product_id"].agg(tuple))
>>> df1
id external_id column1 column2 product_id
0 1 a43505 Example 1 (1, 4, 6)
1 2 11b737 Example 1 (3,)
2 3 3 Example 1 NaN
3 4 lb22 Example 1 NaN
4 5 2 Example 1 NaN
Solution 2:[2]
Here is one way :
df_grp_dim = df_dim.groupby('external_id')['product_id'].aggregate(list)
out = df.join(df_grp_dim,on='external_id')
output:
>>>
id external_id column1 column2 product_id
0 1 a43505 Example 1 [1, 4, 6]
1 2 11b737 Example 1 [3]
2 3 3 Example 1 NaN
3 4 lb22 Example 1 NaN
4 5 2 Example 1 NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | marc_s |
| Solution 2 | marc_s |
