'Compare three columns of pandas df using np.where

I have DataFrame as given:

TXN_DATE_TIME   TX_ID   CUST_ID STATE_1         STATE_2       STATE_3
01-06-2020 00:00    1   123      Maharashtra    Maharashtra   Maharashtra
01-06-2020 00:00    2   345      Pune           Chennai       Gujarat
01-06-2020 00:00    3   222      Chennai        Gujarat       Chennai
01-06-2020 00:00    4   1356     Gujarat        Chennai       Delhi
01-06-2020 00:00    5   2345     Punjab         Punjab        Delhi
01-06-2020 00:00    6   1111     Haryana        Delhi         Punjab
01-06-2020 00:00    7   5678     Delhi          Maharashtra   Haryana
01-06-2020 00:00    8   9999     Kerela         Assam         Assam
01-06-2020 00:00    9   2345     Assam          Assam         Assam
01-06-2020 00:00    10  6666     Tripura        Tripura       Tripura
01-06-2020 00:00    11  7896     Kolkatta       Kolkatta      Kolkatta

I want to create a new column match in the df containing two values Match and No match based on the following conditions:

If State_1==State_2==STATE_3 Then Match=1
Else Match=0

Hence the expected df will be:

enter image description here

I tried to use np.where in pandas df by using:

df['MATCH']=np.where(df['STATE_1']==df['STATE_2']==df['STATE_3'],1,0)

But it gives me the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I wanted to know that is there any other faster method other than np.where that can be used to achieve the expected result and if not then how can I avoid this error? ?



Solution 1:[1]

Use:

df['MATCH']=(df['STATE_1']==df['STATE_2'])&(df['STATE_2']==df['STATE_3'])
df['MATCH'] = df['MATCH'].astype(int)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1