'How do I remove repeated Row Comparisons?

I'm currently new to coding and would require to do pairwise comparisons using Pandas. Hence, I have to find a way to code row-by-row comparisons without any repetitions.

A mock data will be as follows:

output:

Whereby i'm comparing males & their age. However, as seen in the image above, in index 1 there is a combination of Vyel & Allsebrook & in index 4 the same combination is seen with Allsebrook and Vyel.

Ideally, the desired output would be like:

Desired Results

I have managed to remove rows containing the same person twice, but is there a way i can code so i can avoid overlapping data comparisons? Would appreciate any feedback. Thank you!



Solution 1:[1]

Try This:

import pandas as pd

# Create the DF
id_x = [1, 1, 1, 1, 2]
last_name_x = ["Vyel", "Vyel", "Vyel", "Vyel", "Allsebrook"]
gender = ["Male", "Male", "Male", "Male", "Male"]
age_x = [66, 66, 66, 66, 50]
id_y = [1, 2, 3, 4, 1]
last_name_y = ["Vyel", "Allsebrook", "Prinett", "Jinda", "Vyel"]
age_y = [66, 50, 30, 31, 66]

df = pd.DataFrame(id_x, columns=['id_x'])
df['last_name_x'] = last_name_x
df['gender'] = gender
df['age_x'] = age_x
df['id_y'] = id_y
df['last_name_y'] = last_name_y
df['age_y'] = age_y

# Create the keys in order to check duplicates
df["key_x"] = df["id_x"].astype(str) + df["last_name_x"] + df["age_x"].astype(str)
df["key_y"] = df["id_y"].astype(str) + df["last_name_y"] + df["age_y"].astype(str)
df["combined"] = df['key_x'] + "|" +df['key_y']
s = set(df['key_y'] + "|" + df['key_x'])
d = {}


# mark all of the duplicated rows
def check_if_duplicate(row):
    if row in s:
        if row in d:
            return True
        else:
            arr = row.split("|")
            d[arr[1] + "|" + arr[0]] = 1
    return False


df['duplicate'] = df['combined'].apply(check_if_duplicate)

# drop the duplicates and the rows we added in order to check the duplicates
df = df[df['duplicate'] != True]
df.drop(["key_x", "key_y", "combined", "duplicate"], axis=1, inplace=True)

print(df)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tomer S