'Both csv file are identical yet I get mismatch error in my python code output

I have written a script which basically compares two excel file and return mismatch error when it happens.

So, below is my script:

import pandas as pd
def main():
    sheet1 = pd.read_csv(filepath_or_buffer = '1.csv')
    sheet2 = pd.read_csv(filepath_or_buffer = '2.csv')
    # Iterating the Columns Names of both Sheets
    for i,j in zip(sheet1,sheet2):

        # Creating empty lists to append the columns values
        a,b =[],[]

        # Iterating the columns values
        for m, n in zip(sheet1[i],sheet2[j]):

            # Appending values in lists
            a.append(m)
            b.append(n)

        # Sorting the lists
        a.sort()
        b.sort()

        # Iterating the list's values and comparing them
        for m, n in zip(range(len(a)), range(len(b))):
            if a[m] != b[n]:
                print('Column name : \'{}\' and Row Number : {}'.format(i,m))




if __name__ == '__main__':
    main()

My output is:

Column name : 'PL' and Row Number : 0
Column name : 'PL' and Row Number : 1
Column name : 'PL' and Row Number : 2

FYI: in both excel file, PL column contains 'null' value, still it throws the mismatch error.

Can anyone help me to pinpoint how to debug?



Solution 1:[1]

You should probably check that both values are not np.nan as well:

if a[m] != b[n] and (a[m] is not np.nan or b[n] is not np.nan):
    print('Column name : \'{}\' and Row Number : {}'.format(i,m))

Solution 2:[2]

You should probably check this out: Pandas DataFrame Compare

It will tell you where differences are, between two DFs.

A small example :

import pandas as pd

d1 = { "key":[1,2,3,4,5], "key2":[5,6,7,8,9]}
d2 = { "key": [1,2,3,4,6], "key2":[5,6,7,8,10]}

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

df1.compare(df2)

will output a dataframe with a multi-column index like follows

   key       key2
  self other self other
4  5.0   6.0  9.0  10.0

However, this only works for DFs with same labels (or column names). If the column names are not exactly the same you can do df[["key","key1"]].compare(df2[["key","key1"]]) in order to select the matching columns)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2