'Filling empty values in boolean column in Pyspark
I have a dataframe that has some boolean columns and these columns appear empty sometimes, just like other columns of other data types do.
I need to convert this dataframe to a RDD in which each row is turned into a JSON. For that i use the code below
df.toJson().zipWithIndex()
However, when a row is null for certain column, the column doesn't get converted to a key, which leaves me with mismatch schema.
I have tried df.na.fill('').toJson.zipWithIndex() which deals with the columns that are strings, but it the problem still remains when a column is of int or boolean type.
How can I keep all the columns as keys in the json, even when the value is null?
Thanks!
Solution 1:[1]
I managed to fix it, but leaving this here for anyone that might need it.
If you set this property ("spark.sql.jsonGenerator.ignoreNullFields", "false") when setting your spark Session, spark takes into account null values when generating Json objects.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | J.Doe |
