'Add a column based on a condition that iterates over a list
So I have the following dataframe:
Person_x Person_y Apple_x Banana_x Orange_x Apple_y Banana_y Orange_y
Tomas Sidd 1 0 1 1 0 0
Tomas Frank 1 0 1 0 1 1
Tomas Anne 1 0 1 1 1 1
John Lucy 0 1 1 1 1 0
John Mark 0 1 1 0 0 0
John Mike 0 1 1 0 1 1
and what I want to do is: I want to add a new column to that dataframe that contains the name of the fruit that both persons being compared in that row like (both have a 1 on their respective fruits columns). So the result dataframe should look like this:
Person_x Person_y Apple_x Banana_x Orange_x Apple_y Banana_y Orange_y Common Fruits
Tomas Sidd 1 0 1 1 0 0 Apple
Tomas Frank 1 0 1 0 1 1 Orange
Tomas Anne 1 0 1 1 1 1 Apple, Orange
John Lucy 0 1 1 1 1 0 Banana
John Mark 0 1 1 0 0 0
John Mike 0 1 1 0 1 1 Banana, Orange
The thing is that my actual dataframe is huge (~ 100 k rows) and the fruits (columns) that I have to check whether are 1s for both persons will vary over time depending on a list that contains the fruits to compare (let's call this list "fruit_list"). Because of this I can't harcode the name of the columns to inspect, I was trying to get them from a for loop hat iterates over the fruit_list but I'm having a lot of trouble doing that.
Solution 1:[1]
Split your dataframe according the suffix then compare values by multiplying them and finally keep only right values:
strip_suffix = lambda x: x.split('_')[0]
df['Common fruits'] = (
df.filter(regex=r'(?<!Person)_x').rename(columns=strip_suffix)
.mul(df.filter(regex='(?<!Person)_y').rename(columns=strip_suffix))
.melt(ignore_index=False).query('value == 1')
.groupby(level=0)['variable'].agg(', '.join)
)
Output:
>>> df
Person_x Person_y Apple_x Banana_x Orange_x Apple_y Banana_y Orange_y Common fruits
0 Tomas Sidd 1 0 1 1 0 0 Apple
1 Tomas Frank 1 0 1 0 1 1 Orange
2 Tomas Anne 1 0 1 1 1 1 Apple, Orange
3 John Lucy 0 1 1 1 1 0 Banana
4 John Mark 0 1 1 0 0 0 NaN
5 John Mike 0 1 1 0 1 1 Banana, Orange
Solution 2:[2]
Here is another way:
i = ['Person_x','Person_y']
df = df.set_index(i)
df2 = (df.set_axis(df.columns.str.split('_').str[0],axis=1)
.rename_axis('Common Fruits',axis=1)
.astype(bool)
.groupby(level=0,axis=1).all())
df.join(df2.where(df2).stack().reset_index(-1)['Common Fruits'].groupby(level=[0,1]).agg(','.join),on = i).reset_index()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 | rhug123 |
