'Loop over two columns to check for identity of elemnts and create new column

I have the following dataset: (this is just a little part)

Screenshot

  • Right now each "productid" corresponds to an "order_id"
  • I have to create e new column with the "product_id" for each "order_id_OK"
  • the majority of elements of "order_id_OK" are also in "order_id" but in a different order

So the objective would be to have a column where each "product_id" corresponds to the row of "order_id_OK" and not of "order_id"

Right now i'm trying to set up a for loop:

l = []
for i in df["order_id_OK"]:
    for j in df["order_id"]:
        if i == j:
            for x in df["product_id"]:
                l.append(x)

any idea?



Solution 1:[1]

you can merge your dataframe with itself, the output will be a dataframe where
data['order_id'][j]==data['order_id_OK'][i] (i and j same meaning as used in your for loops).

merged_data=data.merge(data, left_on=['order_id'], right_on=['order_id_OK'], how='inner')

in the merged data you will find new columns 'order_id_OK_y' and 'product_id_x' corresponding to your desired output.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Triki Sadok