'Both csv file are identical yet I get mismatch error in my python code output
I have written a script which basically compares two excel file and return mismatch error when it happens.
So, below is my script:
import pandas as pd
def main():
sheet1 = pd.read_csv(filepath_or_buffer = '1.csv')
sheet2 = pd.read_csv(filepath_or_buffer = '2.csv')
# Iterating the Columns Names of both Sheets
for i,j in zip(sheet1,sheet2):
# Creating empty lists to append the columns values
a,b =[],[]
# Iterating the columns values
for m, n in zip(sheet1[i],sheet2[j]):
# Appending values in lists
a.append(m)
b.append(n)
# Sorting the lists
a.sort()
b.sort()
# Iterating the list's values and comparing them
for m, n in zip(range(len(a)), range(len(b))):
if a[m] != b[n]:
print('Column name : \'{}\' and Row Number : {}'.format(i,m))
if __name__ == '__main__':
main()
My output is:
Column name : 'PL' and Row Number : 0
Column name : 'PL' and Row Number : 1
Column name : 'PL' and Row Number : 2
FYI: in both excel file, PL column contains 'null' value, still it throws the mismatch error.
Can anyone help me to pinpoint how to debug?
Solution 1:[1]
You should probably check that both values are not np.nan as well:
if a[m] != b[n] and (a[m] is not np.nan or b[n] is not np.nan):
print('Column name : \'{}\' and Row Number : {}'.format(i,m))
Solution 2:[2]
You should probably check this out: Pandas DataFrame Compare
It will tell you where differences are, between two DFs.
A small example :
import pandas as pd
d1 = { "key":[1,2,3,4,5], "key2":[5,6,7,8,9]}
d2 = { "key": [1,2,3,4,6], "key2":[5,6,7,8,10]}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
df1.compare(df2)
will output a dataframe with a multi-column index like follows
key key2
self other self other
4 5.0 6.0 9.0 10.0
However, this only works for DFs with same labels (or column names).
If the column names are not exactly the same you can do df[["key","key1"]].compare(df2[["key","key1"]]) in order to select the matching columns)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
