'How can I convert a row from a dataframe in pyspark to a column but keep the column names? - pyspark or python

I have an array where it is made up of several arrays.



Solution 1:[1]

Zip the list and then call the dataframe constructor:

df = spark.createDataFrame(zip(*all_data), cols)

df.show(truncate=False)
+-----------------------------+-----------+
|name                         |chromossome|
+-----------------------------+-----------+
|NM_019112.4(ABCA7):c.161-2A>T|19p13.3    |
|CCL2, 767C-G                 |17q11.2-q12|
+-----------------------------+-----------+

Or with zip_longest:

from itertools import zip_longest
df = spark.createDataFrame(zip_longest(*all_data,fillvalue=''),cols)
df.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1