'pyspark json to dataframe schema

i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema

{
    "1-john": {
        "children": ["jack", "jane", "jim"]
    },
    "2-chris": {
        "children": ["bill", "will"]
    }
}

dataframe output needed

ID	father	children
1	john	["jack", "jane", "jim"]
2	chris	["bill", "will"]

Solution 1:^[1]

In the case of pandas, Use:

import json
t = json.dumps(d)
df = pd.read_json(t, orient = 'index')
ids = df.reset_index()['index'].str.split('-').str[0]
fathers = df.reset_index()['index'].str.split('-').str[1]
df['ID']=ids
df['fathers'] = fathers

You can then convert this to pyspark df:

df_sp = spark_session.createDataFrame(df)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	keramat

'pyspark json to dataframe schema

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]