'Creating a dataframe from Lists and string values in pyspark

I have 3 string values and 4 lists that Im trying to create a dataframe.

Lists -

>>> exp_type
[u'expect_column_to_exist', u'expect_column_values_to_not_be_null', u'expect_column_values_to_be_of_type', u'expect_column_to_exist', u'expect_table_row_count_to_equal', u'expect_column_sum_to_be_between']

>>> expectations
[{u'column': u'country'}, {u'column': u'country'}, {u'column': u'country', u'type_': u'StringType'}, {u'column': u'countray'}, {u'value': 10}, {u'column': u'active_stores', u'max_value': 1000, u'min_value': 100}]

>>> results
[{}, {u'partial_unexpected_index_list': None, u'unexpected_count': 0, u'unexpected_percent': 0.0, u'partial_unexpected_list': [], u'partial_unexpected_counts': [], u'element_count': 102}, {u'observed_value': u'StringType'}, {}, {u'observed_value': 102}, {u'observed_value': 22075.0}]

>>> this_exp_success
[True, True, True, False, False, False]

Strings -

>>> is_overall_success
'False'

>>> data_source
'stores'
>>> 
>>> run_time
'2022-02-24T05:43:16.678220+00:00'

Trying to create a dataframe as below.

columns = ['data_source', 'run_time', 'exp_type', 'expectations', 'results', 'this_exp_success', 'is_overall_success']

dataframe = spark.createDataFrame(zip(data_source, run_time, exp_type, expectations, results, this_exp_success, is_overall_success), columns)

Error -

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 748, in createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 416, in _createFromLocal
    struct = self._inferSchemaFromList(data, names=schema)
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 348, in _inferSchemaFromList
    schema = reduce(_merge_type, (_infer_schema(row, names) for row in data))
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1101, in _merge_type
    for f in a.fields]
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1114, in _merge_type
    _merge_type(a.valueType, b.valueType, name='value of map %s' % name),
  File "/usr/lib/spark/python/pyspark/sql/types.py", line 1094, in _merge_type
    raise TypeError(new_msg("Can not merge type %s and %s" % (type(a), type(b))))
TypeError: value of map field results: Can not merge type <class 'pyspark.sql.types.LongType'> and <class 'pyspark.sql.types.StringType'>

Expected Output


+---------------+--------------------------------+--------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------------+                                                                    
| data_source   |  run_time                      |exp_type                              |expectations                                                               |results                                                                                                                                                                                |this_exp_success|is_overall_success|
+---------------+--------------------------------+--------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------------+
|stores         |2022-02-24T05:43:16.678220+00:00|expect_column_to_exist                |{u'column': u'country'}                                                    |{}                                                                                                                                                                                     |True            |False             |
|stores         |2022-02-24T05:43:16.678220+00:00|expect_column_values_to_not_be_null   |{u'column': u'country'}                                                    |{u'partial_unexpected_index_list': None, u'unexpected_count': 0, u'unexpected_percent': 0.0, u'partial_unexpected_list': [], u'partial_unexpected_counts': [], u'element_count': 102}  |True            |False             |
|stores         |2022-02-24T05:43:16.678220+00:00|expect_column_values_to_be_of_type    |{u'column': u'country', u'type_': u'StringType'}                           |{u'observed_value': u'StringType'}                                                                                                                                                     |True            |False             |
|stores         |2022-02-24T05:43:16.678220+00:00|expect_column_to_exist                |{u'column': u'countray'}                                                   |{}                                                                                                                                                                                     |False           |False             |
|stores         |2022-02-24T05:43:16.678220+00:00|expect_table_row_count_to_equal       |{u'value': 10}                                                             |{u'observed_value': 102}                                                                                                                                                               |False           |False             |
|stores         |2022-02-24T05:43:16.678220+00:00|expect_column_sum_to_be_between       |{u'column': u'active_stores', u'max_value': 1000, u'min_value': 100}       |{u'observed_value': 22075.0}                                                                                                                                                           |False           |False             |
+---------------+--------------------------------+--------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------------+


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source