'pandas - what is orient parameter of read_json?
pandas.read_json has orient parameter but not sure what the documentation try to explain.
Please help understand what it does.
orient: str
Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:
- 'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
- 'records' : list like [{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
- 'columns' : dict like {column -> {index -> value}}
- 'values' : just the values array
Tested the JSON below by changing the parameter but not sure what it is doing. orient=index throws an error AttributeError: 'list' object has no attribute 'values' and orient='split' throws AttributeError: 'list' object has no attribute 'values' error, but no idea what it is complaining.
data = """[
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
},
{
"glossary": {
"title": "example glossary 2",
"GlossDiv": {
"title": "K",
"GlossList": {
"GlossEntry": {
"ID": "XML",
"SortAs": "XML",
"GlossTerm": "eXtensible Markup Language",
"Acronym": "XML",
"Abbrev": "ISO ",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
]
pd.read_json(
data,
orient='value'
)
Solution 1:[1]
Your case seems to not fit to any orientation available in read_json.
To comprehend when use each orientation look up e.g. examples in the documentation of this method (https://pandas.pydata.org/docs/reference/api/pandas.read_json.html).
In your case you could start from:
result = pd.json_normalize(json.loads(data))
(import json needed).
This function at least properly drills down the structure of your JSON input.
A similar approach, omitting the first level (glossary) from column names, is:
wrk = pd.read_json(data)
df = pd.json_normalize(wrk.glossary)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
