'How to unionize certain sets in pandas dataframe with one set quickly
So, I have the following dataframe
A B C
0 a1 {x1, x2, x3} {c1, c3, c5}
1 a2 {y1} {c1, c2, c3}
2 a3 {z1, z2} {c2, c4}
Now, for all rows where the set in the C column contains the elements c1 and c3, I want to unionize the set in B with set W = {w1, w2}. So in this case I want this result:
A B C
0 a1 {x1, x2, x3, w1, w2} {c1, c3, c5}
1 a2 {y1, w1, w2} {c1, c2, c3}
2 a3 {z1, z2} {c2, c4}
I'm now doing this.
uppersets = df.B.apply(lambda s: s.issuperset({c1, c3}))
list_B = df[uppersets].B.to_list()
list_B = [item.union(W) for item in list_B]
df['B'] = pd.Series(list_B)
But, is there a more efficient way to do this? I could also step away from using sets, but I i don't want the sets in column B to contain doubles.
Cheers in advance!
ps. Here is code to instantiate the DF:
df = pd.DataFrame({'A' : [1, 2, 3],
'B' : [{1, 2, 3}, {1}, {1,2}],
'C' : [{1,3,5}, {1,2,3}, {2,4}] })
ind_s = [j for j in range(3) if df.loc[j,'C'].issuperset({1, 3})]
list_B = df.loc[ind_s].B.to_list()
list_B = [item.union({10,20}) for item in list_B]
df.loc[ind_s,'B'] = pd.Series(data = list_B, index=bool_s)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
