'AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'> in pyspark

I am trying to generalize schema for creating empty tables in pyspark. My list holds colname and datatype seperated with space.

Below is my code.

I could generalize col name, but it is not able to cast the type.

from pyspark.sql.types import *
tblColumns = [  'emp_name StringType()'
              , 'confidence DoubleType()'
              , 'addressType StringType()'
              , 'reg StringType()'
              , 'inpindex IntegerType()'
              ]

def createEmptyTable(tblColumns):
  structCols = [StructField(colName.split(' ')[0], (colName.split(' ')[1]), True)
    for colName in tblColumns]
  print('Returning cols', structCols)
  return(structCols)
createEmptyTable(tblColumns)

Gives below error.

AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'>

Is there a way to make datatype as generic

Solution 1:^[1]

Yes well, it's throwing an error on you because it's a string. You should cast it somehow by some mapping so for example instead of (colName.split(' ')[1]) you should do some mapping table

from pyspark.sql.types import *
datatype = {
'StringType': StringType
...
}


def createEmptyTable(tblColumns):
  structCols = [StructField(colName.split(' ')[0], datatype[colName.split(' ')[1]](), True)
    for colName in tblColumns]

This way should work, be aware that you will have to declare all the types mapping.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Benny Elgazar

'AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'> in pyspark

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]