'pandas - what is orient parameter of read_json?

pandas.read_json has orient parameter but not sure what the documentation try to explain.

Please help understand what it does.

orient: str

Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:

  • 'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
  • 'records' : list like [{column -> value}, ... , {column -> value}]
  • 'index' : dict like {index -> {column -> value}}
  • 'columns' : dict like {column -> {index -> value}}
  • 'values' : just the values array

Tested the JSON below by changing the parameter but not sure what it is doing. orient=index throws an error AttributeError: 'list' object has no attribute 'values' and orient='split' throws AttributeError: 'list' object has no attribute 'values' error, but no idea what it is complaining.

data = """[
{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
},
{
    "glossary": {
        "title": "example glossary 2",
        "GlossDiv": {
            "title": "K",
            "GlossList": {
                "GlossEntry": {
                    "ID": "XML",
                    "SortAs": "XML",
                    "GlossTerm": "eXtensible Markup Language",
                    "Acronym": "XML",
                    "Abbrev": "ISO ",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}
]

pd.read_json(
    data,
    orient='value'
)


Solution 1:[1]

Your case seems to not fit to any orientation available in read_json.

To comprehend when use each orientation look up e.g. examples in the documentation of this method (https://pandas.pydata.org/docs/reference/api/pandas.read_json.html).

In your case you could start from:

result = pd.json_normalize(json.loads(data))

(import json needed).

This function at least properly drills down the structure of your JSON input.

A similar approach, omitting the first level (glossary) from column names, is:

wrk = pd.read_json(data)
df = pd.json_normalize(wrk.glossary)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1