'How to iterate over columns, with similar names, and check whether they are equal?
This question is a follow-up to a similar one I posted a few days ago.
I have a data frame -- example shown below -- containing columns with similar names: 'Agreement Date', 'Agreement Date_x', etc. I am trying to figure out how to check whether my columns match up, and then create a new yes/no column if they do or do not match up.
| Contract ID | Agreement Date | Agreement Date_x |
|---|---|---|
| 0 | 05/04/1997 | 05/04/1997 |
| 1 | 03/02/1997 | 03/04/1997 |
| 2 | 05/07/2020 | 05/06/2021 |
Note: my actual dataset contains a dozen 'Agreement Date' columns
Many thanks :)
Solution 1:[1]
You could filter on the leading name and count if the value is unique:
df['match'] = df.filter(regex='^Agreement Date').nunique(axis=1).eq(1)
output:
Contract ID Agreement Date Agreement Date_x match
0 0 05/04/1997 05/04/1997 True
1 1 03/02/1997 03/04/1997 False
2 2 05/07/2020 05/06/2021 False
as yes/no:
import numpy as np
df['match'] = np.where(df.filter(regex='^Agreement Date').nunique(axis=1).eq(1), 'yes', 'no')
output:
Contract ID Agreement Date Agreement Date_x match
0 0 05/04/1997 05/04/1997 yes
1 1 03/02/1997 03/04/1997 no
2 2 05/07/2020 05/06/2021 no
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
