'Trying to repeat the same lines of code on different dataframes in python
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
print(df1[~i1.isin(i2)])
I want to use the same lines of code to compare df2,df3 and df3,df4 and df1,df4 without repeating those lines.
Solution 1:[1]
A function is a set of placeholder code that you can run at a later time.
# This is the operation you want to run
# dataframe1 and dataframe2 are placeholders for 2 dataframes
def check_indexes(dataframe1, dataframe2):
keys = list(dataframe2.columns.values)
i1 = dataframe1.set_index(keys).index
i2 = dataframe2.set_index(keys).index
print(dataframe1[~i1.isin(i2)])
# Now that I have defined what I want to do in the function, I can run that code by calling its name and telling it what I want the dataframe1 and dataframe2 placeholders to actually be
check_indexes(df1, df2)
check_indexes(df1, df3)
check_indexes(df2, df3)
# ... etc
You can change the name of the function from check_indexes to whatever you want as long as it's 1 word and adheres to python variable naming conventions. I recommend the name be a verb of some kind.
Here's some additional reading: https://www.w3schools.com/python/python_functions.asp
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jeff |
