'Create dataframe with new columns derived from unique values in a single column
I have a dataframe formatted like this:
| id | fieldname | fieldvalue |
|---|---|---|
| 1 | PC | Dell |
| 1 | Phone | Pixel 6 |
| 2 | PC | Lenovo |
| 3 | Phone | Samsung |
I would like to transform it to :
| id | PC | Phone |
|---|---|---|
| 1 | Dell | Pixel6 |
| 2 | Lenovo | |
| 3 | Samsung |
In other words, create one column per distinct value in column fieldname, fill it with corresponding value from fieldvalue.
How would I do that in pyspark ?
Solution 1:[1]
This is a row to column problem should use pivot.
df = df.groupBy('id').pivot('fieldname').agg(F.first('fieldvalue'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | 过过招 |
