'Trying to repeat the same lines of code on different dataframes in python

keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
print(df1[~i1.isin(i2)])

I want to use the same lines of code to compare df2,df3 and df3,df4 and df1,df4 without repeating those lines.



Solution 1:[1]

A function is a set of placeholder code that you can run at a later time.

# This is the operation you want to run
# dataframe1 and dataframe2 are placeholders for 2 dataframes
def check_indexes(dataframe1, dataframe2):
    keys = list(dataframe2.columns.values)
    i1 = dataframe1.set_index(keys).index
    i2 = dataframe2.set_index(keys).index
    print(dataframe1[~i1.isin(i2)])

# Now that I have defined what I want to do in the function, I can run that code by calling its name and telling it what I want the dataframe1 and dataframe2 placeholders to actually be
check_indexes(df1, df2)
check_indexes(df1, df3)
check_indexes(df2, df3)
# ... etc

You can change the name of the function from check_indexes to whatever you want as long as it's 1 word and adheres to python variable naming conventions. I recommend the name be a verb of some kind.

Here's some additional reading: https://www.w3schools.com/python/python_functions.asp

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jeff