'pyspark json to dataframe schema
i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema
{
"1-john": {
"children": ["jack", "jane", "jim"]
},
"2-chris": {
"children": ["bill", "will"]
}
}
dataframe output needed
| ID | father | children |
|---|---|---|
| 1 | john | ["jack", "jane", "jim"] |
| 2 | chris | ["bill", "will"] |
Solution 1:[1]
In the case of pandas, Use:
import json
t = json.dumps(d)
df = pd.read_json(t, orient = 'index')
ids = df.reset_index()['index'].str.split('-').str[0]
fathers = df.reset_index()['index'].str.split('-').str[1]
df['ID']=ids
df['fathers'] = fathers
You can then convert this to pyspark df:
df_sp = spark_session.createDataFrame(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | keramat |
