'Add a column based on a condition that iterates over a list

So I have the following dataframe:

Person_x     Person_y     Apple_x     Banana_x     Orange_x     Apple_y     Banana_y     Orange_y
Tomas         Sidd           1           0            1            1           0             0
Tomas         Frank          1           0            1            0           1             1
Tomas         Anne           1           0            1            1           1             1 
John          Lucy           0           1            1            1           1             0    
John          Mark           0           1            1            0           0             0   
John          Mike           0           1            1            0           1             1 

and what I want to do is: I want to add a new column to that dataframe that contains the name of the fruit that both persons being compared in that row like (both have a 1 on their respective fruits columns). So the result dataframe should look like this:

Person_x     Person_y     Apple_x     Banana_x     Orange_x     Apple_y     Banana_y     Orange_y     Common Fruits
Tomas         Sidd           1           0            1            1           0             0        Apple
Tomas         Frank          1           0            1            0           1             1        Orange
Tomas         Anne           1           0            1            1           1             1        Apple, Orange 
John          Lucy           0           1            1            1           1             0        Banana 
John          Mark           0           1            1            0           0             0   
John          Mike           0           1            1            0           1             1        Banana, Orange

The thing is that my actual dataframe is huge (~ 100 k rows) and the fruits (columns) that I have to check whether are 1s for both persons will vary over time depending on a list that contains the fruits to compare (let's call this list "fruit_list"). Because of this I can't harcode the name of the columns to inspect, I was trying to get them from a for loop hat iterates over the fruit_list but I'm having a lot of trouble doing that.



Solution 1:[1]

Split your dataframe according the suffix then compare values by multiplying them and finally keep only right values:

strip_suffix = lambda x: x.split('_')[0]
df['Common fruits'] = (
    df.filter(regex=r'(?<!Person)_x').rename(columns=strip_suffix)
      .mul(df.filter(regex='(?<!Person)_y').rename(columns=strip_suffix))
      .melt(ignore_index=False).query('value == 1')
      .groupby(level=0)['variable'].agg(', '.join)
)

Output:

>>> df
  Person_x Person_y  Apple_x  Banana_x  Orange_x  Apple_y  Banana_y  Orange_y   Common fruits
0    Tomas     Sidd        1         0         1        1         0         0           Apple
1    Tomas    Frank        1         0         1        0         1         1          Orange
2    Tomas     Anne        1         0         1        1         1         1   Apple, Orange
3     John     Lucy        0         1         1        1         1         0          Banana
4     John     Mark        0         1         1        0         0         0             NaN
5     John     Mike        0         1         1        0         1         1  Banana, Orange

Solution 2:[2]

Here is another way:

i = ['Person_x','Person_y']
df = df.set_index(i)

df2 = (df.set_axis(df.columns.str.split('_').str[0],axis=1)
       .rename_axis('Common Fruits',axis=1)
       .astype(bool)
       .groupby(level=0,axis=1).all())

df.join(df2.where(df2).stack().reset_index(-1)['Common Fruits'].groupby(level=[0,1]).agg(','.join),on = i).reset_index()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Corralien
Solution 2 rhug123