'How to process Json file with column headers and data separated in different lists in spark

I have Json file as follows:

{"columns": ['Name', 'City', 'DOB'],
"data":[
     ['ABC', 'Georgia', '01/05/1987'],
     ['ABC', 'Kansas', '10/11/1989']]}

How can this file be processed properly using pyspark so that it will get loaded into table containing name, city and dob in a proper format. Will I have to first transform it into the usual json format and then proceed ahead with json load or is there any other way to handle this



Solution 1:[1]

I tried with the below approach and it works. Not sure if its the best way, I'm open for suggestions or improvements.

data = {"columns": ['Name', 'City', 'DOB'],
        "data":[
                  ['ABC', 'Georgia', '01/05/1987'],
                  ['ABC', 'Kansas', '10/11/1989']
              ]
       }
# Converting to json string and then reading it into a pandas dataframe

import json
import pandas as pd

json_data = json.dumps(data)

pandasDF = pd.read_json(json_data, orient='split')
pandasDF
# Converting pandas dataframe to spark dataframe as below - 

df = spark.createDataFrame(pandasDF)
df.show(truncate=False)

+----+-------+----------+
|Name|   City|       DOB|
+----+-------+----------+
| ABC|Georgia|01/05/1987|
| ABC| Kansas|10/11/1989|
+----+-------+----------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DKNY