'How to generate the columns based on the unique values of that particular column in pyspark?
I have a dataframe as below
+----------+------------+---------------------+
|CustomerNo|size |total_items_purchased|
+----------+------------+---------------------+
| 208261.0| A | 2|
| 208263.0| C | 1|
| 208261.0| E | 1|
| 208262.0| B | 2|
| 208264.0| D | 3|
+----------+------------+---------------------+
I have another table df that consists of customerNo's only. I have to create columns of unique comfortStyles and have to update the total_items_purchased in df
My df table should look like
CustomerNo,size_A,size_B,size_C,size_D,size_E
208261.0 1 0 0 0 1
208262.0 0 2 0 0 0
208263.0 0 0 1 0 0
208264.0 0 0 0 3 0
Can anyone tell me how to do this?
Solution 1:[1]
You can use pivot function to rearrange the table.
df = (df.groupBy('CustomerNo')
.pivot('size')
.agg(F.first('total_items_purchased'))
.na.fill(0))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Emma |
