'How to use pandas and numpy to compare two excel workbooks with multiple tabs?

I have two xlsx files that have multiple tabs. I need to compare values in each tab based on the tab name. (e.g. sheet1 in file1 needs to be compared with sheet1 in file2 and so on). When I use the following code, it will only compare and write the first sheet. Please help me figure out why all tabs do not get compared.

import pandas as pd
import numpy as np

df1 = pd.read_excel('test_1.xlsx', sheet_name=None)
df2 = pd.read_excel('test_2.xlsx', sheet_name=None)

with pd.ExcelWriter('./Excel_diff.xlsx') as writer:
    for sheet, df1 in df1.items():
        # check if sheet is in the other Excel file
        if sheet in df2:
            df2 = df2[sheet]
            comparison_values = df1.values == df2.values

            print(comparison_values)

            rows, cols = np.where(comparison_values == False)
            for item in zip(rows, cols):
                df1.iloc[item[0], item[1]] = '{} → {}'.format(df1.iloc[item[0], item[1]], df2.iloc[item[0], item[1]])

            df1.to_excel(writer, sheet_name=sheet, index=False, header=True)

The test_1 file is set up as follows.

Sheet1

|test 1|test 2|test 3|
|------|------|------|
|1     |1     |1     |
|1     |1     |1     |
|1     |1     |1     |

Sheet2

|test 1|test 2|test 3|
|------|------|------|
|3     |3     |3     |
|3     |3     |3     |
|3     |3     |3     |

The test_2 file is set up as follows.

Sheet1

|test 1|test 2|test 3|
|------|------|------|
|2     |2     |2     |
|2     |2     |2     |
|2     |2     |2     |

Sheet2

|test 1|test 2|test 3|
|------|------|------|
|4     |4     |4     |
|4     |4     |4     |
|4     |4     |4     |

Here is the output I get using the code above.

Sheet1

|test 1|test 2|test 3|
|------|------|------|
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |

Here is the output I get if I align the df1.to_excel function with the 'if' statement.

Sheet1

|test 1|test 2|test 3|
|------|------|------|
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |

Sheet2

|test 1|test 2|test 3|
|------|------|------|
|3     |3     |3     |
|3     |3     |3     |
|3     |3     |3     |

Here is the output I want that shows the difference in values by sheet name.

Sheet1

|test 1|test 2|test 3|
|------|------|------|
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |

Sheet2

|test 1|test 2|test 3|
|------|------|------|
|3 → 4 |3 → 4 |3 → 4 |
|3 → 4 |3 → 4 |3 → 4 |
|3 → 4 |3 → 4 |3 → 4 |

Thanks!



Solution 1:[1]

With the help of a colleague I was able to troubleshoot the problem with my excel sheet comparison code. Within the 'if' loop, df2 was being overwritten. I changed the name from df2 to df2sheet within the 'if' loop and now it works beautifully.

import pandas as pd
import numpy as np

df1 = pd.read_excel('test_1.xlsx', sheet_name=None)
df2 = pd.read_excel('test_2.xlsx', sheet_name=None)

with pd.ExcelWriter('./Excel_diff.xlsx') as writer:
    for sheet, df1 in df1.items():
        # check if sheet is in the other Excel file
        if sheet in df2:
            df2sheet = df2[sheet]
            comparison_values = df1.values == df2sheet.values

            print(comparison_values)

            rows, cols = np.where(comparison_values == False)
            for item in zip(rows, cols):
                df1.iloc[item[0], item[1]] = '{} ? {}'.format(df1.iloc[item[0], item[1]], df2sheet.iloc[item[0], item[1]])

            df1.to_excel(writer, sheet_name=sheet, index=False, header=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Karessa Manning