'How to convert dataframe column to dictionary

Firstly, I want to thank everybody for any help in advance! I have 4 tables, I joined them and got a PySpark dataframe. One of the dataframe columns looks like this and it has about 200 000 records:

{"table_name":"BTR.DAILY_BTR.JSC_MON","login":"0015471"}
{"table_name":"BTR.DAILY_BTR.ESHOP.JSC_MON","login":"0015471"}

The type of this column is 'string'. I need to get value by key table_name. I tried to use json method .loads:

sparam = t1.select(col('ADD_PARAMS'))
json.loads(sparam)

But I got error:

TypeError: the JSON object must be str, bytes or bytearray, not DataFrame

Then I tried to change the column type:

   sparam = t1.select(col('ADD_PARAMS').cast('string'))
   type(sparam)

It shows that the type is dataframe:

   pyspark.sql.dataframe.DataFrame

Anyway i tried to use method "loads" again:

   json.loads(sparam)

But I got the same error:

TypeError: the JSON object must be str, bytes or bytearray, not DataFrame

I tried to use different options to get value of table_name ranging from converting to json, dict, regex, splits, but nothing helped.

UPD

Here is some of the code which is used:

import io
import sys
import pandas as pd
import numpy as np
import json
import findspark
findspark.init('/mnt/nfs-spark/spark-2.3.3/')
findspark.find()
import pyspark
import pyspark.sql.functions as F
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import Window, HiveContext
from pyspark.sql.functions import col, lit
from pyspark.sql import Row
from pyspark.sql.functions import udf


survey_requests = spark.read.parquet('/mnt/gluster-storage/etl/download/surv/survey_requests/*')
channels = spark.read.parquet('/mnt/gluster-storage/etl/download/surv/channels')
channel_segments = spark.read.parquet('/mnt/gluster-storage/etl/download/surv/channel_segments'))
channel_branches = spark.read.parquet('/mnt/gluster-storage/etl/download/surv/channel_branches'))
channel_touchpoints = spark.read.parquet('/mnt/gluster-storage/etl/download/surv/channel_touchpoints'))

t1 = (c.join(cs, c.IDD_SEGMENT == cs.ID_SEGMENT, "left")\
        .join(ct, ct.ID_TOUCHPOINT == c.IDD_TOUCHPOINT, "left")\
        .join(cb, cb.ID_BRANCH == c.IDD_BRANCH, "left"))\
        .join(survey_requests, c.ID_CHANNEL == survey_requests.CHANNEL_ID, 'right')



   sparam = t1.select(col('ADD_PARAMS').cast('string'))
   type(sparam)
   pyspark.sql.dataframe.DataFrame
   json.loads(sparam)
TypeError: the JSON object must be str, bytes or bytearray, not DataFrame

python pyspark

Solution 1:^[1]

It's .cast('string'), not .cast('str')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	pltc

'How to convert dataframe column to dictionary

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]