'AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'> in pyspark
I am trying to generalize schema for creating empty tables in pyspark. My list holds colname and datatype seperated with space.
Below is my code.
I could generalize col name, but it is not able to cast the type.
from pyspark.sql.types import *
tblColumns = [ 'emp_name StringType()'
, 'confidence DoubleType()'
, 'addressType StringType()'
, 'reg StringType()'
, 'inpindex IntegerType()'
]
def createEmptyTable(tblColumns):
structCols = [StructField(colName.split(' ')[0], (colName.split(' ')[1]), True)
for colName in tblColumns]
print('Returning cols', structCols)
return(structCols)
createEmptyTable(tblColumns)
Gives below error.
AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'>
Is there a way to make datatype as generic
Solution 1:[1]
Yes well, it's throwing an error on you because it's a string.
You should cast it somehow by some mapping
so for example
instead of (colName.split(' ')[1]) you should do some mapping table
from pyspark.sql.types import *
datatype = {
'StringType': StringType
...
}
def createEmptyTable(tblColumns):
structCols = [StructField(colName.split(' ')[0], datatype[colName.split(' ')[1]](), True)
for colName in tblColumns]
This way should work, be aware that you will have to declare all the types mapping.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Benny Elgazar |
