'How can I convert a row from a dataframe in pyspark to a column but keep the column names? - pyspark or python
I have an array where it is made up of several arrays.
Solution 1:[1]
Zip the list and then call the dataframe constructor:
df = spark.createDataFrame(zip(*all_data), cols)
df.show(truncate=False)
+-----------------------------+-----------+
|name |chromossome|
+-----------------------------+-----------+
|NM_019112.4(ABCA7):c.161-2A>T|19p13.3 |
|CCL2, 767C-G |17q11.2-q12|
+-----------------------------+-----------+
Or with zip_longest
:
from itertools import zip_longest
df = spark.createDataFrame(zip_longest(*all_data,fillvalue=''),cols)
df.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |