'Pyspark Extract one key/value from a JSON column

I've seen various question regarding reading JSON from a column in Pyspark. In all of them it appears the entire schema has to be specified and then to_json is applied and then keys can be referenced easily.

I have a column that has a large number of unique keys in it and I only want to extract the key/value for one key (which isn't present in all rows). Is there a quick and easy way (outside of using some kind of regexp string parsing function) to extract this key/value by name?



Solution 1:[1]

Answered my own question:

df.withColumn('newNameOfColumn', f.json_tuple("JSON_COLUMN_AS_STRING", "KEY_YOU_WISH_TO_EXTRACT"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ben890