'PySpark: How to Transpose multiple columns in a Dataframe
I am new to Spark and need a help with transposing the below input dataframe into the desired output df (Rows to Columns) using PySpark or Spark Sql.
Input Dataframe-
A B C D
1 2 3 4
10 11 12 13
......
........
Required Output (transposed) data
A 1
B 2
C 3
D 4
A 11
B 12
C 13
D 14
....
......
It is better if I can pivot the input data (columns) as per our requirement.
Solution 1:[1]
You can make a generalized function like below (inspired from my previous answer here):
def stack_multiple(data,cols=None,output_columns=["col","values"]):
"""stacks multiple columns in a dataframe,
takes all columns by default unless passed a list of values"""
cols = data.columns if cols is None else cols
cols= [cols] if isinstance(cols,str) else cols
return data.selectExpr(f"""stack({len(cols)},{','.join(map(','.join,
(zip([f'"{i}"' for i in cols],[f"`{i}`" for i in cols]))))})
as ({','.join(output_columns)})""")
Sample Runs:
stack_multiple(df).show()
+---+------+
|col|values|
+---+------+
| A| 1|
| B| 2|
| C| 3|
| D| 4|
| A| 10|
| B| 11|
| C| 12|
| D| 13|
+---+------+
stack_multiple(df,['A','B'],output_columns=['A','B']).show()
+---+---+
| A| B|
+---+---+
| A| 1|
| B| 2|
| A| 10|
| B| 11|
+---+---+
Solution 2:[2]
Use stack function in spark.
Example:
df.show()
#+---+---+---+---+
#| A| B| C| D|
#+---+---+---+---+
#| 1| 2| 3| 4|
#| 10| 11| 12| 13|
#+---+---+---+---+
from pyspark.sql.functions import *
df.selectExpr("stack(4,'A',A,'B',B,'C',C,'D',D) as (key,value)").show()
#+---+-----+
#|key|value|
#+---+-----+
#| A| 1|
#| B| 2|
#| C| 3|
#| D| 4|
#| A| 10|
#| B| 11|
#| C| 12|
#| D| 13|
#+---+-----+
Solution 3:[3]
df = spark.createDataFrame([[1,2,3,4],[10,11,12,13]]).toDF('A','B','C','D')
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 10| 11| 12| 13|
+---+---+---+---+
df.select(
F.explode(
F.create_map(
*reduce(lambda x, y: x+y,
[[F.lit(col), col] for col in df.columns]
)
)
)
)
+---+-----+
|key|value|
+---+-----+
| A| 1|
| B| 2|
| C| 3|
| D| 4|
| A| 10|
| B| 11|
| C| 12|
| D| 13|
+---+-----+
Solution 4:[4]
Try this :
SELECT json_build_object( 'result', json_agg(json_build_object( 'value', c.content)))
FROM unnest(string_to_array('the quick; lazy fox', ';')) AS c(content)
Result :
{"result" : [{"value" : "the quick"}, {"value" : " lazy fox"}]}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | notNull |
| Solution 3 | mck |
| Solution 4 | Edouard |
