'Export Json string to CSV from Teradata database using pyspark

I am trying to create a read the data from teradata database by creating a dataframe and writing it to my local system using Dataframe.write method.

However, I am not able to read json string from teradata table. What could be the possible solution.

Code :

    from pyspark import SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import time

appName = "Pyspark Teradata"
master = "local"
conf = SparkConf()  # create the configuration
conf.set("spark.repl.local.jars", "<path_to_terajdbc4.jar>")
conf.set("spark.executor.extraClassPath", "<path_to_terajdbc4.jar>")
conf.set("spark.driver.extraClassPath", "<path_to_terajdbc4.jar>")

# Create Spark session
spark = SparkSession.builder \
    .config(conf=conf) \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

driver = 'com.teradata.jdbc.TeraDriver'

# Define the function to load data from Teradata

a=time.time()
def load_data(driver, jdbc_url, sql, user, password):
    return spark.read \
        .format('jdbc') \
        .option('multiLine','true')\
        .option('url', jdbc_url) \
        .option('encoding','UTF-8')\
        .option('dbtable', '({sql}) as src'.format(sql=sql)) \
        .option('user', user) \
        .option('password', password) \
        .option('driver', driver) \
        .option('STRICT_NAMES', 'OFF') \
        .load()

sql = "Select * from AdventureWorksDW.my_table_2"
url = "jdbc:teradata://192.168.xx.xx"
user = "dbc"
password = "dbc"

df_td = load_data(driver,url,sql,user,password)
df_td.show()

b=time.time()
print("Time taken", b-a)

Error that I am getting :

    py4j.protocol.Py4JJavaError: An error occurred while calling o50.load.
: java.sql.SQLException: Unsupported type OTHER
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:247)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:312)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:312)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Unknown Source)

Source Data in Teradata:

How to output json datatype data into csv file using pyspark?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Export Json string to CSV from Teradata database using pyspark

Sources

Related Questions