'pandas check if there are duplicates of repeated values between the two columns and not inside one column
I have 2 columns and I want to check if there are duplicates of repeated values between the two columns and not inside one column. The length of the datasets is not equal. I am using
df2['columnA'] = df1['columnA'].isin(df2['columnA'])
but it gives me the wrong answer.
I want to check if there are repeated values from the longer dataset in the shorter dataset. if yes I want a column to be added to the shorter dataset, indicating True. If not False
Dataset1:
columnA
1598618777
553834731
1562313985
1138106620
1463509237
1560632350
Dataset2
ColumnA
1330011201
1464235676
1232080731
1446254576
1563383895
1402595440
1555409735
1551787372
1523820531
1138106620
1196764367
1551787372
Solution 1:[1]
you can create one dataframe with append
and then use duplicated to check the duplicate and if you want to remove then you can use .drop_duplicates
df=Dataset1.append(Dataset1)
df.duplicated(subset=['ColumnA'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DataSciRookie |