'Coalescing rows in pyspark with string operation
Solution 1:[1]
Anaother way:
df= (df2.groupby('ID','fname','lname','Score').agg(collect_list('Activity').alias('Activity'))#Create an array of Activity
.withColumn('Activity', concat(array_join('Activity','.'),lit('.')))#join array and then concatenate dot
).show(truncate=False)
Solution 2:[2]
You can use concat_ws with the collect_list.
Although, this won't guarantee the order of sentences as Spark table is unordered. If you care the order, you need some sequential column to order upon.
(df.groupBy('ID')
.agg(*[F.col(x).alias(x) for x in ['fname', 'lname', 'Score']],
F.concat_ws(' ', F.collect_list(F.col('Activity'))).alias('Activity'))
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | wwnde |
| Solution 2 | Emma |


