'Python nan and None
I am trying to apply the SCD2 with Python. The source files are .xlsx files and will be loaded into PostgreSQL.
I have applied the condition where I identify the changed rows in the merged dataframe by:
ChangedRecords = np.where((All_Data.is_current_x.notnull()) & (All_Data.is_current_y.notnull())
& ((All_Data['col1_x'] != All_Data['col1_y'])
| (All_Data['col2_x'] != All_Data['col2_y'])
| (All_Data['col3_x'] != All_Data['col3_y'])
... ),1,0)
The issue I am having is that the NULL values in source (.xlsx) are displayed as nan
and the NULL values in destination (PostgreSQL) are displayed as None and therefore my code is considering these values as different and whenever a row has NULL value it gets in ChangedRecords which is wrong.
The datatype of the column is String.
I have used the below code to replace, but it seems like is not working.
Source = ftp_invoices.replace([np.nan],[None])
I still get nan in NULL values
Solution 1:[1]
If think nan is a string and not NaN (see the case).
Try:
Source = ftp_invoices.replace('nan', None)
Example:
df = pd.DataFrame({'A': [None, np.NaN, 'nan']})
print(df)
# Output
A
0 None # <- None
1 NaN # <- np.NaN
2 nan # <- the string nan
# Replace real NaN by None
>>> df.replace(np.NaN, None)
A
0 None
1 None # Only here
2 nan
# Replace nan strings by None
>>> df.replace('nan', None)
A
0 None
1 NaN
2 None # Only here
# Combine all
>>> df.replace({'nan': None, np.NaN: None})
A
0 None
1 None
2 None
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
