'How to use np.select but with multiples conditions?

I have this kind of dataframe (only have 1.0 or 2.0) :

column_1 column_2 column_3
1.0 2.0 2.0
1.0 1.0 2.0
2.0 1.0 1.0
... ... ...

I would like to create a new column, based on the row values of the other three columns, in pseudo code this would give :

if column_1 = 1.0 or column_2 = 1.0 or column_3 = 1.0, return 1.0 in df['column_4']

if column_1 and column_2 = 1.0 or column_1 and column_3 = 1.0 or column_2 and column_3 = 1.0, return 2.0 in df['column_4]

if column_1 and column_2 and column_3 = 1.0, return 3.0 in df['column_4']

else 0.0

I can't put it in code, this is my attempt for now :

sum_1 = df.loc[(df.column_1 == 1.0) | (df.column_2 == 1.0) | (df.column_3 == 1.0)]
sum_2 = df.loc[((df.column_1 == 1.0) & (df.column_2 == 1.0)) | ((df.column_1 == 1.0) & (df.column_3 == 1.0)) | (df.column_2 == 1.0) & (df.column_3 == 1.0)]
sum_3 = df.loc[(df.column_1 == 1.0) & (df.column_3 == 1.0) & (df.column_2 == 1.0)]


df['column_4'] = np.select([sum_1, sum_2, sum_3], [1.0, 2.0 , 3.0], default=0.0) 

But i have an ValueError: shape mismatch: objects cannot be broadcast to a single shape

Sorry if my question is an easy thing and txh for any help!



Solution 1:[1]

You need to remove the loc part in your sum_x variable and np.select check condition in sequence, so you need to put stricter condition to front

sum_1 = (df.column_1 == 1.0) | (df.column_2 == 1.0) | (df.column_3 == 1.0)
sum_2 = ((df.column_1 == 1.0) & (df.column_2 == 1.0)) | ((df.column_1 == 1.0) & (df.column_3 == 1.0)) | (df.column_2 == 1.0) & (df.column_3 == 1.0)
sum_3 = (df.column_1 == 1.0) & (df.column_3 == 1.0) & (df.column_2 == 1.0)

df['column_4'] = np.select([sum_3, sum_2, sum_1], [3.0, 2.0, 1.0], default=0.0)
print(df)

   column_1  column_2  column_3  column_4
0       1.0       2.0       2.0       1.0
1       1.0       1.0       2.0       2.0
2       2.0       1.0       1.0       2.0

Or you can use sum on columns

df['column_4'] = df.eq(1).sum(axis=1)
print(df)

   column_1  column_2  column_3  column_4
0       1.0       2.0       2.0       1.0
1       1.0       1.0       2.0       2.0
2       2.0       1.0       1.0       2.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1