'AWS Glue pyspark dataframe to pandas null values problem
I am having the following problem with an AWS Glue Job, basically i am trying to clean up a dataframe by filling null values, however from 5 spark dataframes 1 of them the script to fill null values was not working, but on the others it was.
df_opp = df_opp.fillna({'opp_redraw_amount':'0','opp_loan_date':'1970-01-01', .....}
So i decided to print as I am converting them into pandas dataframe and I notice the following that looks really bizarre for me and might be the cause of why na fill was not working, but I dont know how to fix this.
The spark dataframe looks like this:
+-------------+--------------------+---------------------------+-----------------+
|opp_closedate|opp_contact_attempts|opp_days_since_last_payment|opp_edm_follow_up|
+-------------+--------------------+---------------------------+-----------------+
| 2019-03-12| null| null| null|
| 2020-08-22| null| null| null|
| 2019-08-02| null| null| null|
| 2018-08-02| null| null| null|
| 2019-04-09| null| null| null|
| 2019-05-01| null| null| null|
| 2019-03-13| null| null| null|
| 2019-07-29| null| null| null|
| 2020-12-04| null| null| null|
| 2017-09-12| null| null| null|
+-------------+--------------------+---------------------------+-----------------+
When i convert the dataframe to pandas and print it
df_opp = dfc.select(list(dfc.keys())[2]).toDF()
df_opp.show(10)
pd_df_opp = df_opp.toPandas()
print(pd_df_opp.head(10))
I get some None, null and NaN, I thought this values will be None instead of those 2 other options:
opp_contact_attempts opp_days_since_last_payment opp_edm_follow_up \
40418 NaN NaN null
17225 NaN NaN null
6151 NaN NaN null
24383 NaN NaN null
43401 NaN NaN null
24462 NaN NaN null
45101 NaN NaN null
15675 NaN NaN null
43002 NaN NaN null
7838 NaN NaN null
Why on spark I have a null but in pandas I have sometimes None, null, NaN? if I print dtypes of that pandas I get object as dtype, and when i print schema on the spark, i have string
What I am missing or how can I properly fill the null values?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
