'Pyspark Extract one key/value from a JSON column
I've seen various question regarding reading JSON from a column in Pyspark. In all of them it appears the entire schema has to be specified and then to_json is applied and then keys can be referenced easily.
I have a column that has a large number of unique keys in it and I only want to extract the key/value for one key (which isn't present in all rows). Is there a quick and easy way (outside of using some kind of regexp string parsing function) to extract this key/value by name?
Solution 1:[1]
Answered my own question:
df.withColumn('newNameOfColumn', f.json_tuple("JSON_COLUMN_AS_STRING", "KEY_YOU_WISH_TO_EXTRACT"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ben890 |
